Psychological Monographs: 
` General and: Applied 


Combining the Applied Psychology Monographs and the Archives of Psychology 
with the Psychological Monographs 


i VoL. 77 
ia 1963 


GREGORY A. KımBLE, Editor 


Duke University 
Durham, North Carolina 


Consulting Editors 


ANNE ANASTASI James J. JENKINS 
FRED ATTNEAVE Henry F. KAISER 
Harotp P. BECHTOLDT WILLIAM KESSEN 
C. Aran BONEAU GARDNER LINDZEY 
‘Don BYRNE JANE LOEVINGER 
ARTHUR R, COHEN Boyp McCANDLESS A 
W. GRANT#DAHLSTROM Quinn McNEMAR 

WILLIAM iN. DEMBER Oscar A. PARSONS 

CHarLES W. ERIKSEN DavıD. SHAKOW 

NORMAN GARMEZY CHARLES D. SPIELBERGER 


HELEN Orr, Managing Editor 
Joyce J. Carrer, Editorial Assistant 


Published by 
THE AMERICAN PSYCHOLOGICAL ASSOCIATION, INC. » 
1333 SIXTEENTH STREET, N. W., WASHINGTON, D. C. 20036 


& Psyl. Ausuatch 


=| 


Bureau of Ean! 
en toe Gy thal y 


Whole No. 

564 SYSTEMATIC OPERATIONS IN SOLVING Concert PROBLEMS: A Parametric STUDY OF A CLASS OF PROBLEMS 
Murray Glanzer, Janellen Huttenlocher, and William H, Clark. 

565 RANGE-FREQUENCY COMPROMISE IN JUDGMENT. Allen Parducci. 

566 Tactics OF INGRATIATION AMONG LEADERS AND SUBORDINATES IN A STATUS Hierarcuy. Edward E. Jones, 
Kenneth J. Gergen, and Robert G. Jones. 

567 RETENTION AS A FUNCTION or DEGREE OF LEARNING AND eggs Sars ee INTERFERENCE. Benton J. 
Underwood and Geoffrey Keppel. 

568 Impression FORMATION AS A FUNCTION or ADJUSTMENT. Anthony J. Matkom. 

569 TASK PERFORMANCE AND RESPONSES TO FAILURE AS FUNCTIONS OF IMBALANCE IN THE SELF-CONCEPT. 

` Harry Kaufmann. 

570 PREDICTION OF THE First YEAR COLLEGE PERFORMANCE OF Hic Aprirupe STUDENTS. Robert C. Nichols — 

and John L. Holland. $ 
ry 

571 Proposep Movet or Eco FUNCTIONING: COPYING AND DEFENSE MECHANISMS IN RELATIONSHIP TO Q i 
Cance. Norma Haan. ž a 

572 HERITABILITY or PERSONALITY: A DEMONSTRATION. Irving I. Gottesman. 7 

573 BRAIN Injury IN THE PrescHoot CHILD: Some DEVELOPMENTAL CONSIDERATIONS: I. Paonia 
or NorMAt Cmnpren. Frances K. Graham, Claire B. Ernhart, Marguerite Craft, and Phyllis W. Bermadi 

574 BRAIN INJURY IN THE PrescHOOL CHILD: SOME DEVELOPMENTAL CONSIDERATIONS: II. COMPARISON 
BRAIN INJURED anD NormaL Cumpren. Claire B. Ernhart, Frances K. Graham, Peter L. Eichman, ji 
Joan M. Marshall, and Don Thurston. 4 

575 TEMPORAL NUMEROSITY AND THE PsycHoLocicaL Unit or Duration. Carroll T. White. 

F 

576 TEMPORAL DISCRIMINATION AND THE INDIFFERENCE INTERVAL: IMPLICATIONS FOR A MODEL OF THE ~ 
“INTERNAL Crock.” Michel Treisman. ; 

577 MMPI Decision RULES FOR THE IDENTIFICATION OF COLLEGE MALADJUSTMENT: A DIGITAL COMPUTER 


CONTENTS OF VOLUME 77 


Approach. Benjamin Kleinmuntz. 


r 


J 


l. 77, No. 1 


Psychological Monographs: General and Applied 


SYSTEMATIC OPERATIONS IN SOLVING 
CONCEPT PROBLEMS: 
A PARAMETRIC STUDY OF A CLASS OF PROBLEMS 


MURRAY GLANZER, JANELLEN HUTTENLOCHER 
School of Medicine, University of Maryland 


anp WILLIAM H. CLARK 
Walter Reed Army Institute of Research, Washington, D. C. 


The monograph describes experimental studies of the factors that control effi- 
ciency of performance on the family of concept problems introduced by Smoke 
and studied by Hovland and other investigators. The effects of the following 
factors were examined: (a) whether the examples are positive or negative 
instances, (b) ratio of number of relevant dimensions to total number of 
example dimensions, (c) presence of superfluous information, (d) ordering 
of the examples, (e) amount of information that has to be stored at the begin- 
ning of a problem, (f) amount of information required to sort the dimensions 
into relevant and irrelevant, and (g) rate at which new information is pre- 
sented within the example series. The findings make it possible to specify the 
systematic operations carried out by Ss in solving the problems. 


Whole No. 564, 1963 


LTHOUGH it has been shown that human 
subjects solving problems carry out a 
variety of systematic or rational operations 
(Bruner, Goodnow, & Austin, 1956), these 
~ operations have been ignored in most of the 
experimental work on problem solving. They 
are ignored because they are complex, do not 
' fit available formulations, and do not easily 
lend themselves to reliable demonstration, The 
first stage in establishing these operations as a 
proper object of study and as a respectable 
component of theory is to demonstrate their 
existence in clear and simple form. Bruner, 
Goodnow, and Austin (1956) have started 
| this work, and the experiments reported below 
© further it. The second stage is to demonstrate 
` experimental control over these systematic 
' operations and to carry out experimental 
_ analyses of them. The experiments described 
below are directed at this goal. In these ex- 
| periments a family of problems will be ex- 
plored by experimental manipulation of the 
characteristics of the problems. 


1 This investigation was carried out under Contract 
DA-~49-007-MD-1004 between the Office of The Sur- 
geon General and the University of Maryland. The 
f authors wish to thank Betty F. Morrison, Alice L. 
Moskowitz, Elizabeth Engle, and Ometta Kearney for 
eir assistance in the work of the project. 


The experiments may be. viewed in either 
of two ways. From a restricted point of view, 
they are parametric studies of the factors 
that determine the difficulty of a general type 
of concept problem introduced by Smoke 
(1932) and studied extensively by Hovland 
(1952), Hovland and Weiss (1953), Bruner, 
Goodnow, and Austin (1956), Cahill and 
Hovland (1960), Hunt and Hoyland (1960), 
and Hunt (1961). From a broader point of 
view, they are studies of the main type of 
systematic or rational operations elicited from 
the subject by the class of problems, and the 
factors that determine the appearance and 
optimal functioning of these systematic or 
rational operations. From the latter point of 
view, the family of concept problems used in 
the studies below is used primarily as a 
technique for eliciting systematic operations 
by the subject. 

The basic problem situation is one in which 
the subject is shown a series of examples in 
succession (e.g., blocks of various heights, 
widths, shapes, and®colors), As each example 
is shown, the subject is told that it is, or is 
not, a member of a certain concept class. On 
the basis of the examples and the information 
concerning their class membership, the sub- 


; 


2 GLANZER, HUTTENLOCHER, AND CLARK 


ject is required to identify the criterial char- 
acteristics of the class (e.g., tall, red blocks). 
Investigators interested in systematic oper- 
ations have used this type of concept problem 
rather than one modeled after discrimination 
learning situations, because it is difficult to 
analyze out systematic operations in the 
latter type of problem. (See Levine, 1959, 
for an example of the complexity involved in 
analyzing systematic operations out of a 
discrimination learning situation.) In this 
type of problem, the series of examples often 
presents no more than the minimum amount 
of information necessary to define the concept. 

Information about systematic operations in 
the problem situation described above has 
been derived either (a) indirectly, by observ- 
ing the effects on performance of experimental 
variations in the example series, these varia- 
tions being dictated by a model of the process 
(Hovland & Weiss, 1953), or (b) by some 
variant of Aussage technique (Bruner et al., 
1956). The first method was used in the 
studies described below. 

The general procedure was to set up con- 
cept problems and then by appropriate ex- 
erimental variation to (a) analyze the oper- 
ations being carried out by the subject, and 
(b) determine the functional relations that 
control the speed with which these operations 
appear and the efficiency with which they 
function. 

The purpose of the studies was to develop 
reliable information on systematic operations 
in the solution of concept problems. If re- 
liable information about systematic opera- 
tions can be established, the following will 
have been accomplished. 


1, The basis for a fuller understanding of 
the solution of concept problems will be 
established. Although human subjects can be 
shown to be using various systematic opera- 
tions during the course of problem solving, 
there is no reliable or extensive body of infor- 
mation based on experimental manipulation of 
these operations, These systematic operations 
have, therefore, functioned only as spoilers of 
simple theoretical structures. Once these 
systematic operations are given clear experi- 
mental definition they can be used construc- 
tively in theories of problem solving. 


2. Methods will be explored that should be 
useful in the experimental analyses of oth 
related types of complex performance. Thus, : 
many types of problem solving, such as $ 
solving of insight problems that seem resi 
ant to detailed experimental analysis may be 
easier to handle once a body of reliable infor- , 
mation concerning one variety of systematig | 
operations is established. j 


Recent work on the family of concept 
problems introduced by Smoke? starts with a 
paper by Hovland (1952), in which he carried 
out a detailed logical analysis of the informa- 
tion furnished by positive and negative 
instances in a wide variety of different prob- 
lems. Using the information from the analy- 
sis, Hovland and Weiss (1953) attempted to 
determine whether series of positive examples 
and series of negative examples providing the — 
same information would be handled equally 
well by subjects. Implicit in this work is a 
model which views the subject as enumerating © | 
all possibly correct hypotheses and eliminat- © 
ing some of these hypotheses with each ex- ~ 
ample. The experimental results obtained by 
Hoyland and Weiss did not fit such a model 
Examination of the work of Bruner, Goodnow, ~ 
and Austin (1956), in which the various 
types of systematic operations carried out by 
the subjects were tabulated, suggests an ex- 
planation. Hypothesis enumeration and re- 
duction is a rare type of operation. The 
predominant operation is one in which the 
subject stores dimension values and selects 
relevant dimensions or eliminates irrelevant 
dimensions. Bruner, Goodnow, and Austin 
refer to this operation as a focusing, or 
wholistic, strategy. Subsequent work has been 
concerned with memory effects in series of 
negative examples (Cahill & Hovland, 1960), 
and in series containing both positive and 
negative examples (Hunt, 1961); and the 
order in which subjects considered different ` 
types of concepts (conjunctive, disjunctive, 
relational) in solving problems (Hunt & 
Hovland, 1960). 


® The original Smoke problems had relations be: 
tween characteristics of the examples as their dimen- 
sions. This type of dimension has been used infre= 
quently in subsequent work. 


PARAMETRIC STUDY or Concert FORMATION 3 


Definition of Terms 


> There are a number of terms that will be 

“used frequently below. 

: Number of dimensions in examples. A set 
of concept examples may vary in one to n 
dimensions. Thus, a set of blocks used as 
concept examples may vary in height, width, 
color, texture, weight, and shape: six dimen- 
sions. A problem using six-dimension exam- 
ples will also be referred to as a six-dimen- 
sional problem. 

Number of values on the dimensions. Each 
dimension may vary over two to n values. 
For example, with respect to the height dimen- 
sion, blocks might be either 3 inches high or 
1 inch high; or they might have 20 values, 
going from $ inch to 5 inches in 4-inch steps. 

Number of relevant dimensions, The num- 
ber of dimensions required to specify the 
criterial characteristics in a problem may vary 
from one to the number of dimensions in the 
examples. For example, if the problem re- 
quires that the blocks be sorted according to 
height (tall versus short) the number of 
relevant dimensions is one. If, however, the 
problem requires selection according to height, 
width, and color (e.g., all tall, wide, red blocks 
go into one class), then the number of relevant 
dimensions is three. The number of relevant 
dimensions will also be referred to as the num- 
ber of dimensions in the concept. 

Number of irrelevant dimensions. Those 
dimensions that are not required for sorting 
the examples into categories are irrelevant 
dimensions. They are, of course, the number 
of example dimensions minus the number of 
relevant dimensions. 

Type of concept. Several different types of 
concepts are possible with two or more rele- 
vant dimensions: conjunctive, disjunctive, re- 
lational. The experiments below are concerned 
only with conjunctive concepts. These require 
given values on each of the relevant dimen- 
sions. Thus, “both tall and wide” is a two- 
dimensional conjunctive concept, 

Positive and negative instances. An ex- 
ample that meets the requirements for mem- 
_ bership in a specified category is called a pos- 

itive instance of the category, or concept. An 

example that does not meet these require- 
~ ments is called a negative instance. 


Alternative Procedures 


The experimental procedure used in study- 
ing performance on concept problems has 
considerable significance in determining the 
characteristics of the performance observed. 
To relate the findings of studies described 
below to other work, it is important, therefore, 
to make explicit the possible alternatives in 
the procedure and to specify the procedure 
used here in the light of these alternatives. 

Subject versus experimenter ordering. The 
procedure may be set up so that the subject 
picks his own sequence of examples, with the 
experimenter furnishing information about 
each example or response. Or it may be set 
up so that the experimenter picks the sequence 
of examples. The first procedure may be 
called subject ordered; this is the procedure 
Bruner, Goodnow, and Austin used when 
studying selection strategies. The second 
procedure may be called experimenter 
ordered; this is the procedure they used when 
studying reception strategies. In the experi- 
ments below, the sequences are experimenter 
ordered. 

Subject versus experimenter pacing. The 
procedure may be set up so that the rate at 
which examples are presented and the amount 
of time they are exposed is either determined 
by the subject or by the experimenter. The 
Archer, Bourne, and Brown (1955) procedure 
was subject paced: the subjects were allowed 
to view each example for as long as they 
wished before assigning it to a category. The 
Oseas and Underwood (1952) procedure was 
experimenter paced: the subjects were re- 
quired to make their responses at a pace 
determined by the experimental apparatus. 
Experiments in which the procedure is experi- 
menter ordered are usually ones in which it 
is also experimenter paced. Similarly, subject 
ordered procedures are usually subject paced. 
This relationship, however, is not a neces- 
sary one. For example, Archer et al. (1955) 
use an experimenter ordered, subject paced 
procedure. The experimenter paced procedure 
was used in the experiments below. 

Single-example versus final-response in- 
structions. The subject may be instructed to 
make a response to every example presented. 
He may be instructed to make a response only 


4 GLANZER, HUTTENLOCHER, AND CLARK 


after all the problem information is presented 
or when he has all the necessary information. 
Instruction to respond to single examples 
fits more easily into procedures involving 
response-contingent reinforcement and has 
been favored by learning-oriented investiga- 
tors. Final response instructions are more 
usual with the Smoke-type of concept problem 
and was used in the experiments below. 
One-problem versus multiple-problem series. 
Experimenters working with an approach 
derived from theories of learning have usu- 
ally examined the performance of subjects 
acquiring a single concept. For this particular 
approach, this procedure is appropriate since 
interest is centered on changes in performance 
within a problem. The procedure, moreover, 
gives reliable results since many measures are 
obtained for each subject. Most of the experi- 
menters working with the systematic opera- 
tions approach have ‘examined responses over 
a series of problems. If final response instruc- 
tions are used (see above), each problem 
generates a single observation for each subject. 
Use of multiple-problem series permits the 
development of reliable measures. Multiple- 
problem series, furthermore, permit the sub- 
ject to adjust his response to the type of prob- 
lem he is facing. In the experiments below, 
multiple-problem series will be used and at- 
tention will be given both to the development 
of the response and the asymptotic level of 
the response under a given set of conditions. 


Number of Categories 


An n-dimensional, conjunctive concept per- 
mits sorting into a minimum of two cate- 


n 
gories and a maximum of II V; categories, 
i=1 
where V; is the number of values on the ith 
relevant dimension. Thus, a two-dimension 
concept (height and width) with two values 
on each dimension permits a minimum of two 
categories (e.g., tall-wide and a residual cate- 
gory), and a maximum of four categories 
(tall-wide, tall-narrow, short-wide, short nar- 
row). The task differs markedly when the 
subject has to deal with two, as opposed to 
four, categories. Some investigators have 
noted this difference, with the statement that 


the subject is learning one concept in the 
former case and four concepts in the latter 
case (Archer et al., 1955). The experiments 
below use the two-category procedure: a 
specified class and a residual class, The two- 
category procedure was introduced by Smoke 
and used by subsequent investigators with 
this family of problems. Here, and below, 
the term concept will be used for the set of 
criterial characteristics that define the speci- 
fied category. 


Information on Number of Relevant Dimen- 
sions 


Hovland has pointed out that telling the 
subject how many dimensions the concept has 
(before the problem series is presented), 
markedly affects the informational load of the 
task. For example, if the subject is informed 
that he will be shown eight-dimension exam- 
ples, each dimension having two possible val- 
ues, and that the solution will be a two-dimen- 
sion, conjunctive concept, then he is faced 
initially with 

8! 


TMP oe 


(4) 
possible concepts. If the first example is 
positive, it reduces this number to 28. If, 
however, he is not told the number of relevant 
dimensions, then he is faced with 


8! 


8 
Se ames oh de Se ed 
2 OT neszeme 070° 


r 


possible hypotheses. If the first example is 
positive, it reduces this number to 255. Hov- 
land’s procedure of specifying the number of 
relevant dimensions before the presentation 
of the problem was used in the experiments 
below. If it is not used, then certain example 
series such as those consisting of positive ex- 
amples can have several possible answers. 

In summary, the concept tasks considered 
below involved the following: multidimen- 
sional conjunctive concepts, experimenter 
ordered and experimenter paced presentation, 
two categories (an identified category and a 
residual category), final response, multiple- 
problem series, and specification of the num- 
ber of relevant dimensions. 


PARAMETRIC STUDY oF Concept FORMATION 5 


Form of Examples 


Concept examples used in previous studies 
have included blocks, Chinese ideograms, Eng- 
lish words, pictures of household objects, and 
drawings of complex geometric figures. These 
types of concept examples have a complicating 
characteristic: their several dimensions are 
not equally available to the subject. With a 
given set of examples, some concepts are more 
difficult than others because the criterial di- 
mensions are unlikely to be noticed. In the 
majority of the experiments below, concept ex- 
amples were used that equalized the perceptual 
availability of the dimensions. Blocks like 
those used in the Hanfmann-Kasanin (1937) 
test were first considered. The possible dimen- 
sions and values of a set of eight blocks may 
be summarized as follows: 


Dimensions 
Height Width Color 
tall wide red 
Values short narrow blue 


Instead of presenting a tall, narrow, red block, 
it would be possible to present a card bearing 
the following: 


Width 
narrow 


Color 
red 


Height 
tall 


This card could be simplified by the use of 
symbols for both the dimensions and their val- 
ues, as follows: 

H WwW C 
t n r 

The upper-case letters stand for dimensions, 
the lower-case letters for dimension values. 
A short, wide, blue block would be encoded as 
follows: 

H W it 
s w b 

This form has the disadvantage of requiring 
that the subject memorize the meanings of the 
symbols. It is, however, not necessary to re- 
tain the original meanings of the symbols. The 
subjects may be instructed to notice whether 
there is an s or t in the first position, an n or 


w in the second position and an r or b in the 
third position. With this type of material, it 


HOE 
DIE EVA 


Fic. 1. Simplified concept examples with dimen- 
sions presented as figures: Part A. three dimensions; 
Part B. eight dimensions. 


is possible to find a match for any problem re- 
quiring three two-valued dimensions. 

The concept example may be simplified 
further. Instead of the letters H, W, C, it 
is possible to present a set of symbols, as in 
Figure 1A. Instead of listing a letter below 
each dimension symbol (diamond, circle, 
square) to stand for a value on the dimension, 
it is possible to use two values: black and 
white. These values may be signified by mak- 
ing each of the dimension figures either black 
or white. Thus, dimensions have been trans- 
lated into figures, and values on dimensions 
into colors. An eight-dimension example might 
then be presented as in Figure 1B, with ap- 
propriate values entered into the dimension 
symbols. This type of simplified concept ex- 
ample has been used by other investigators of 
concept attainment (Metzger, 1958; Shepard, 
Hovland, & Jenkins, 1961). Shepard et al. 
refer to this type of example as a distributed 
representation, as opposed to a compact repre- 
sentation, e.g., Hanfmann-Kasanin blocks. 
They present data showing that in the learn- 
ing of six different classifications, the “dis- 
tributed” and “compact” examples give essen- 
tially the same results, i.e., the ranking of 
difficulty of the six classifications is the same 


for both types of examples. $: 


EXPERIMENT I: EXPOSURE TIME AND NUM- 
BER OF EXAMPLE DIMENSIONS 


A preliminary experiment was carried out 
to determine the effects of two variables: (a) 
exposure time, the amount of time each ex- 
ample in the concept problem is shown, and 
(b) the number of dimensions in the ex- 
amples. The subjects solved two concept 
problems with two-dimension conjunctive solu- 


6 GLANZER, HUTTENLOCHER, AND CLARK 


tions of the same general form as those used 
in the positive series of Experiment II below. 
The number of dimensions was varied by 
having examples with six- or eight-dimen- 
sion symbols, i.e., figures. Since all subjects 
solved two-dimension concepts, the variable 
of number of example dimensions could also 
be called the number of irrelevant dimensions. 
Six groups, consisting of approximately 20 
subjects each,’ were assigned to the cells of 
a 3x2 factorial design formed by three levels 
of exposure time: 1.75, 3.00, and 5.50 seconds 
per example, and two levels of example 
dimensions: six and eight. Half the subjects 
were given the two problems in one order; 
the other half were given them in the reverse 
order, Significant effects on the number of 
trials to solution as measured by means of 
repeated test series were found for both 
number of example dimensions (F=17.54, 
df=1/106, p < .001), and exposure time 
(F=4.22, df=2/106, p < .05). The results 
show a high degree of regularity with the 
means for the three exposure times declining 
linearly, and a marked difference between the 
six- and eight-dimension problem groups. 
The experiment indicated that the use of 
one or two problems was not sufficient for the 
investigation of performance on concept prob- 
lems since, with an obvious and strong vari- 
able like exposure time, and a total of 118 
subjects, it was barely possible to obtain 
significance at the .05 level. The findings of 
this experiment led to the use of much larger 
numbers of problems, the multiple-series 
procedure, as in the experiments below. 


EXPERIMENT II: EXAMPLE SIGN, Concept 
Size, SERIES COMPLEXITY 


The purpose of this experiment was to re- 
peat some of the work initiated by Hovland 
and Weiss on the use of negative examples in 
solving concept problems, and to explore some 
of the variables controlling efficiency of per- 
formance with concept problems. The main 
variables in the experiment were: 


3 Since the groups were not all equal, varying be- 
tween 21 and 18, the analysis of variance mentioned 
‘below was carried out by the method of unweighted 
means (Snedecor, 1946, p. 287). 


1. Example sign—the definition of concepts 
by positive, as opposed to negative, instances. 
Negative instances are used less efficiently 
than positive instances, even when the amount 
of information they convey is comparable 
(Hovland & Weiss, 1953). The attempt was 
made to match the characteristics of positive 
and negative example series as closely as pos- 
sible and then to compare performance on the 
two types of series. 


2. Concept size—the number of dimensions 
required to define the concept, i.e., the number 
of relevant dimensions. 


3. Series complexity—the addition of re- 
dundant examples to a set of examples that is 
sufficient to define the concept. 


Method 


Equipment. The concept examples were on slides 
that were shown with an automatic projector (Bausch 
and Lomb Balomatic 500) on a standard 4% X 6 foot 
screen. 

Problems. The problems were presented as series 
of successive examples, Each example consisted of a 
set of eight figures in fixed order on a blue back- 
ground, with each figure colored either black or white. 
Seven examples are presented in Figure 2. The fig- 
ures correspond to the dimensions of a conventional 
concept example; the colors are the values on these 
dimensions. 

There were 45 conjunctive concept problems, equally 
divided into problems with two-, four-, and six- 
dimension solutions. The number of relevant dimen- 


| POROXESVA + 
> OOHEaXSVA : 
> COROXSVA + 
* GOORBOXKSVA © 
© @OLIOKEVA + 
< @OHBOXKSVA + 
* OONOXKSWA + 


Fic. 2, A two-dimension concept, “black diamond, 
black X,” defined by a simple-positive series of exam- 
ples. (The plus signs indicate that the examples are 
identified as positive instances, The examples are 
shown successively.) 


PARAMETRIC STUDY OF CoNcEPT FORMATION 7 


' G@LOXSVA + 
e OOLOXSVUA © 
>: GOCOLIOXSVA + 
* @LOXSVA + 
> OOLIOXSMA : 


Fic. 3. A four-dimension concept, “black diamond, 
white square, white spade, black X,” defined by a 
simple-positive series of examples, (The examples are 
shown successively.) 


sions, i.e., the number of dimensions required to spec- 
ify the solution, will be referred to as concept size. 
Examples of two-, four-, and six-dimension concepts 
are shown in Figures 2, 3, and 4. The problems were 
further subdivided into three matched sets, each set 
containing five problems of each concept size. The 
sets of problems were given consecutively, permitting 
the analysis of the effects of learning, 

The sequence of examples was arranged so that 
there was a change in one dimension figure on each 
successive example, Each dimension that had been 
changed in one example was changed back to its ini- 
tial value of black or white on the next example.* 

Four alternative forms of the 45 problems were 
constructed: 


1, Simple-positive form—the examples were all 
positive and the concept was defined by changes in 
the irrelevant dimensions, Simple-positive problems 
for each concept size are shown in Figures 2, 3, and 4. 

2, Simple-negative form 5—an initial positive ex- 


*This technique for changing examples is called 
here origin based. For a comparison of the origin 
based procedure with another type of procedure, see 
Experiment VI. 

5 The simple-negative form is the same in structure 
as the series referred to by Hovland and Weiss as 
“mixed positive and negative.” 


Number EXAMPLES Feedback 


' O@LOXS04 : 
> C@LaXSVUA + 
> CO@MAXSVUA : 


Fic. 4, A six-dimension concept, “white diamond, 
black circle, black spade, black X, white heart, black 
triangle,” defined by a simple-positive series of exam- 
ples. (The examples are shown successively.) 


Number EXAMPLES Feedback 


' €@01OXSVA + 


2 CO@LOXOVUA - 
> OL OaXOVUA - 
* G@LORSVA - 


OOBOXOVA 


Fic. 5, Simple-negative form for the four dimen- 
sion concept, “black diamond, white square, white 
spade, black X.” (The minus signs indicate examples 
identified as negative instances. Compare with 
simple-positive form for the same concept—Figure 3.) 


ample was followed by a series of negative examples. 
The examples defined the concept by changes in 
the relevant dimensions. The four-dimension simple 
positive problem of Figure 3 is converted into a sim- 
ple-negative problem in Figure 5. In the simple-posi- 
tive form, a two-dimension concept requires the 
initial positive plus six positives; a four-dimension 
concept requires the initial positive plus four posi- 
tives; a six-dimension concept requires the initial 
positive plus two positives. In the simple-negative 
form (in which the concept is defined by changes in 
the relevant, instead of the irrelevant, dimension) a 
two-dimension concept requires the initial positive 
plus two negatives; a four-dimension concept re- 
quires the initial positive plus four negatives; and a 
six-dimension concept requires the initial positive 
plus six negatives. Thus, since there were an equal 
number of two-, four-, and six-dimension concepts, 


Number EXAMPLES Feedback 


' C@LOXKOVA + 


2 O@LIOXSVA - 
> @@LIOXSVA |: 
* GOO OKOVA * 
> OL OXOVUA - 
< G@LOXSVA - 
7 O@LIOXSVA + 
© COLIOKOVA : 


Fic. 6. Complex-positive form for the four-dimen- 
sion concept, “black diamond, white square, white 
spade, black X.” (Compare with Figures 3 and 5.) 


8 GLANZER, HUTTENLOCHER, AND CLARK 


the subject who received the simple-negative form of 
the problems viewed the same total number of ex- 
amples as the subject who worked the simple-positive 
form of the problems. 

The positive and negative forms of the problems 
were matched not only with respect to the number of 
instances, and the amount of information conveyed, 
but also, as will be noted below, with respect to the 
systematic operations that could be carried out on 
them. 

3. Complex-positive form—constructed by ran- 
domly inserting between the examples of the simple- 
positive form all but one of the negative examples 
from the simple negative form. All complex-positive 
problems consisted, therefore, of a series of eight 
examples (see Figure 6). The negative examples added 
redundant, but incomplete, information to the simple- 
positive series. The problems could only be solved 
by using all the information in the positive examples. 

4. Complex-negative form—constructed in the same 
way as the complex-positive. The information, how- 
ever, was carried by a basic simple-negative form, 
with all but one of the positives from the simple-posi- 
tive form inserted at random (see Figure 7). The 
additional positive examples added redundant, but 
incomplete, information. The problems could only be 
solved by using all the information in the negative 
examples, All complex-negative problems consisted 
of eight examples. 

The four problem combinations detailed above 
formed a 2 X 2 factorial design, with two levels of 
series complexity and two types of example series: 
positive and negative. The effect of type of example 
series will be referred to as the effect of the example 
sign. 

Procedure. The subjects received a minimum of 
information about the nature of the task or tech- 


' @@LIOXSVA > 
- O@LOXSVA * 
: C@LOXSVA - 
* $@LOXOVA - 
> G@OUOXKOVA * 
© GOUARDA + 
" GP@LIOXSEVA - 
© COROXOVA - 


Fic. 7. Complex-negative form for the four-dimen- 
sion concept, “black diamond, white square, white 
spade, black X.” (Compare with Figures 3, 5, and 6.) 


niques for handling the problems. They were in- 
structed that they would see several series of ex- 
amples for each of which they were to discover a 
combination of figures that would be the answer. 
The following instructions® were read to the sub- 
jects. 


What we would like you to do is to solve a 
series of problems of the following nature. You 
will be shown sets of slides that look like these 
[show slides]. All slides will contain the same 
eight figures: diamond, circle, square, spade, X, 
club, heart, and triangle. These figures will be in 
the same order each time, but the color of some 
of the figures will change from slide to slide. Each 
figure may be either black or white. 

For each set of slides, we have picked out a 
combination of figures which we will call “the 
secret combination.” The figures in this combina- 
tion might be all black, all white, or some black 
and some white. After you have seen the slides, you 
will be asked to figure out what the secret com- 
bination is, and to write the answer down in your 
booklet. 

Let’s do a sample problem. Let's suppose that the 
secret combination is black circle and white spade. 
The slides you would be shown might be the fol- 
lowing. [Sample problem is shown. The sample 
problem consists of four examples, each of which 
is identified as either a positive or negative in- 
stance.] On the basis of these slides, you would 
be asked to figure out the answer and write it down 
in the booklet. In this case, you would write a B 
in the circle figure, and a W in the spade figure. 
Please do that now. Any questions? Turn the page. 

Now, let’s consider another sample problem. Sup- 
pose that the secret combination in this problem 
is black circle, white square, black club, black tri- 
angle. The slides you would be shown might be 
the following. [Another four-example sample prob- 
lem is shown and the classification of each example 
given.] On the basis of these slides you would write 
down the answer black circle, white square, black 
club, black triangle, by putting a B in the circle 
figure, a W in the square figure, etc. Will you please 
do that right now on page 1. 


ê These instructions were modified appropriately 
for each experiment, In Experiments II and III ref- 
erence was made to positive and negative instances, 
and feedback concerning example sign was given in 
both the sample problems and the main series of 
problems. (The form of feedback was: “Yes, this 
[slide] does contain the secret combination” or “No, 
this [slide] does not contain the secret combina- 
tion.”) In Experiment IV the references were to 
blocks and their dimensions, rather than to eight 
figures. The subjects were also told that all examples 
shown would contain the “secret combination” (i.e., 
were positive instances). In Experiments V and VI 
the subjects were told that all of the examples would 
contain the “secret combination” (i.e. were positive 
instances), and that this combination would always 
contain four figures (i.e., was four-dimensional). 


PARAMETRIC STUDY OF CoNcEPT FORMATION 9 


Of course, in the regular experiment you will not 
know the answers beforehand. But in each prob- 
lem you will be given enough information so that 
you can figure out the answer. This is an im- 
portant point. In every problem, the set of slides 
will be sufficient to tell you what the secret com- 
bination is. And at the end of each problem, you 
will be given time to write your answer down in 
the booklet. Every problem can be solved without 
guesswork, on the basis of the information given. 
However, if you are not sure of your answer, guess. 


After the subjects had been shown two practice 
problems to acquaint them with the presentation pro- 
cedure and the method of writing their answers, they 
were shown their main series of 45 problems. Each 
example in a problem was shown for 9.75 seconds with 
a 2-second dark interval between successive slides. 
Every problem was shown twice in succession.” The 
subjects wrote their answers at the end of each 
presentation. The experimenter indicated, before each 
presentation, the number of relevant dimensions, iden- 
tified each example as a positive or negative instance 
by saying either “yes,” or “no,” and read the correct 
answer after the subjects had completed their second 
response. Approximately 15 seconds were allowed for 
the subjects to write their answers. They wrote 
their answers in booklets, each page of which had 
a set of figures corresponding to the example figures, 
by inserting Bs and Ws within the dimension figures, 
to indicate the relevant dimensions and their values. 
The trials were grouped into sessions of 20 trials each. 
The group therefore required 90 trials during 5 ses- 
sions to complete the 45 problems. The last session 
consisted of 10 trials. Each session took approxi- 
mately 45 minutes and was separated from the next 
session by a 5-minute rest period, with one exception. 
There was an hour interval between the third and 
fourth sessions. 


Subjects. There were four groups of Army enlisted 
men, each group containing 11 subjects. Each group 
was assigned to one of the four treatment combina- 
tions or types of example series—simple-positive, sim- 
ple-negative, complex-positive, complex-negative. The 
subjects were average or above average in Army in- 
telligence tests. The four groups did not differ signifi- 
cantly in intelligence test scores (F = 1.279, df = 3/40, 
AEA DNS 


7 The two-trial procedure was adopted after an ex- 
periment was carried out with a one-trial procedure 
in which each problem was given only once. The re- 
sults of this experiment indicated a higher amount of 
variability than that found under the two-trial pro- 
cedure. Detailed comparison of the results from the 
one-trial procedure with the results from the first 
trial of the two-trial procedure reveal that the varia- 
bles of concept size and learning are acting in the 
same way under both procedures. Comparison of 
first trial and second trial data in these and subse- 
quent experiments also indicate that the two trials 
are furnishing the same data. 


TABLE 1 


ANALYSIS OF VARIANCE: ExprerrMent II 


Source df MS F 
Between subjects 43 
Example sign (E) 1 193.760 4.262* 
Series complexity (S) 1122791 — 
EXS il -730 = 
Error (b) 40 45.464 
Within subjects 352 
Concept size (C) 2 130.654 39.968**#* 
Learning (L) 2 83.715 20,216**** 
CXL 4 9.961 5.215**** 
CXE 2 13.488 4.126** 
CxS 2 23.669 7.240*** 
CXEXS 2 320 == 
LXE 2 1.336 = 
LXS AET oaia 
LXEXS 2 6.820 1.647 
CXLXE 4 3.779 1.979 
OLIS 4 2.044 1.070 
CXLXEXS 4 4.378 2.292 
Error (1) 80 3.269 
Error (2) 80 4.141 
Error (3) 160 1.910 
*p< 05 
** p Z 025. 
*** > < (005. 
eee > < 001. 
Results 


Each subject received nine scores, con- 
sisting of the number of correct answers on 
two-, four-, and six-dimension concepts in 
each of the three successive groups of prob- 
lems. Analysis of variance of these scores 
(see Table 1) indicated a significant effect of 
the example sign (p < .05) and learning (p 
< .001), but no significant effect for series 
complexity. The learning curves for the four 
experimental groups are given in Figure 8. 

The effect of concept size was highly signif- 
icant (p < .001), with the four-dimension 
concepts solved less frequently than either 
the two- or six-dimension concepts (see Figure 
9). Separate comparisons of means with ¢ 
tests (df=10) for each experimental group 
found the difference between the two- and 
four-dimension means significant at the .05 
level or better, in all four groups; the differ- 
ence between the four- and six-dimension 
means significant in all but the complex- 
negative group; and the difference between 
the two- and six-dimension means significant 


10 GLANZER, HuTTENLOCHER, AND CLARK 


SP 
CP 


SN 


CN 


5 INGED 


MEAN NUMBER CORRECT 


o-Nnubtuarnoe 


= 


Ir m 
SETS OF PROBLEMS 

Fic. 8. Experiment II: Learning curves for the 
four experimental groups: simple-positive (SP), 
simple-negative (SN), complex-positive (CP), and 
complex-negative (CN). (Each point represents 15 
problems.) 


in only the complex-positive and complex- 
negative groups. 

Reduction of the degrees of freedom, in 
line with the Greenhouse-Geisser (1959) 
recommendation for repeated measurements 
designs, did not affect the significance of the 
main within-subjects variables (learning and 
number of dimensions), but reduced the 
significance levels of the interactions: concept 
size with learning (p < .05), concept size 
with example sign (p < .05), concept size 
with series complexity (p < .025). The 
interactions reflect differences in the distances 
between the means for the three concept sizes 
under different experimental conditions, 
Under all conditions, however, the ranking 
remains the same. Thus, as learning pro- 
gresses, the order of the means is always two- 
dimensional, highest; six-dimensional, next; 
and four-dimensional, lowest, but the two- 
dimensional means show greater improvement 
than the six-dimensional. 

The errors made by the simple-positive 
and simple-negative groups were examined in 
detail. If, as the evidence discussed below 
seems to indicate, the subjects are carrying 


18 sP 
17 
16 
15 i 
14 g 
13 
5 12 
& SN 
& u 
5 
9 io 
f 
go 
58 
2 cN 
z 7 
6 
2 mI 
5 
4 
3 
2 
1 
o 
2 4 6 


CONCEPT SIZE 


Fic. 9. Experiment II: Mean number of correct 
answers as a function of concept size and the form of 
the example series: simple-positive, simple-negative, 
complex-positive and complex-negative. (The curve 
labeled “III” is for a group discussed in Experi- 
ment III below.) 


out two steps in solving the problems, then 
two types of errors can be distinguished: 
errors in selecting dimensions and errors in 
the storage or specification of dimension 
values. An error in selecting dimensions was 
defined as one in which the subject chooses, 
for example, “diamond and square” as the 
relevant dimensions, when the correct com- 
bination is “circle and square.” A selection 
error can be assigned to a specific stage in the 
example series. Thus, in a simple-positive 
series, if the diamond changed in value in the 
fourth example following the initial example, 
it should be eliminated as an irrelevant 
dimension at that stage. Its appearance in the 
answer, was, therefore, classified as a selec- 
tion error, Stage 4. 

The simple-positive group tended to make 
its errors in the early stages, whereas the 
simple-negative group tended to make its 


8 The complex series, because of the redundant 
examples they included, did not permit simple iden- 
tification of the stages at which dimension-selection 
errors occurred. 


i 


PARAMETRIC STUDY oF CONCEPT FORMATION 11 


errors in the later stages of the series.” In the 
simple-positive group, seven subjects made 
more early errors than late errors; in the 
simple-negative group, only one subject made 
more early errors. The difference between 
the proportion of subjects making early errors 
is significant at the .02 level. 

An answer was classified as containing an 
error in the specification of the dimension 
values if it had a dimension with a value 
different from the value of that dimension in 
the initial example. Thus, if the answer 
given was “white circle and black square” 
instead of “black circle and black square,” 
the error was considered a value-specification 
error. An answer could contain, of course, 
both a value-specification error and a dimen- 
sion-selection error. 

Value-specification errors may be con- 
sidered errors in storage of the initial example. 
Of a total of 490 incorrect answers made by 
the simple-positive group, 191 (39%) in- 
volved value-specification errors. Of a total 
of 620 errors made by the simple-negative 
group, 368 (59%) involved value-specifica- 
tion errors. 


Discussion 


Hovland (1952) has analyzed the amount 
of information logically conveyed by series 
of positive and negative instances and has 
shown why more negative instances than 
positive instances are ordinarily required to 
define conjunctive concepts. It might be 
assumed that Hovland’s logical model for the 
transmission of concept information furnishes 
a psychological model for the performance of 
the subject solving concept problems. Two 
pieces of evidence contradict this assumption. 


1. Hovland and Weiss’ (1953) data show 
that logically equivalent concept tasks are 


°A dimension-selection error was categorized as 
“early” if it was assigned to the first half of the series 
of stages. The categorization for the simple positive 
form may be summarized as follows: 


Number of 
dimension- Stages included 
Concept size selection stages as “early” 
2s 6 1-3 
4 4 1-2 


6 2 1 


not psychologically equivalent. Fewer sub- 
jects solve problems with negative instances 
than with positive instances, even though 
both sets of instances convey the same amount 
of information. This difference holds when 
the same number of instances are given for 
both sets and also with both successive and 
simultaneous presentation of examples, 


2. Hovland’s logical model works by the 
enumeration of all possible correct hypotheses 
and elimination of sets of these as each ex- 
ample is presented. Bruner et al. (1956) 
label this operation “scanning” when it is 
carried out by their subjects, and report that 
it is rarely adopted for this general type of 
problem. 


The basic difficulty in making use of 
Hovland’s logical model as a psychological 
model is its use of the hypothesis as the basic 
informational unit. On the basis of the data 
of Bruner et al., it is clear that, for the sub- 
ject, the basic unit is the dimension and that 
the subject is not eliminating hypotheses, 
but selecting or eliminating dimensions, The 
problems used in the experiments described 
here were structured to permit operations on 
the dimensions. They were matched so that 
essentially the same operations would be 
carried out with both positive and negative 
example series. In problems with positive 
examples, the subjects could simply store the 
values of the initial example and then elimi- 
nate irrelevant dimensions on the basis of 
changes between successive examples, In 
problems with negative examples, they could 
store the initial example values and then 
select relevant dimensions on the basis of 
changes between successive examples. 

That the subjects are carrying out the 
systematic operation of dimension selection, 
rather than hypothesis elimination, is in- 
dicated by the V shaped functions generated 
by the variable of concept size. The ordinary 
expectation is that increase in the number of 
relevant dimensions would cause a monotonic 
decrease in the number of solutions. This 
would also follow from any theory in which 
the subject is viewed as processing hypotheses, 
since the number of possible correct hypoth- 
eses for two-dimension problems is 112; for 
four-dimension problems, 1,120; and for six- 


12 GLANZER, HUTTENLOCHER, AND CLARK 


dimension problems, 1,792. In Hovland’s 
terminology the total amount of information 
to be communicated to the subject, i.e., the 
number of hypotheses to be eliminated, in- 
creases as number of relevant dimensions 
increases. If, however, the subject is eliminat- 
ing dimensions, sorting them into relevant 
and irrelevant, then the maximum amount of 
information is handled when the dimensions 
are halved. In the problems above, both two- 
dimension and six-dimension concepts require 
the partitioning of the dimensions into a set 
of two and a set of six. The four-dimension 
concepts require a four-four split of the ex- 
ample dimensions. From an information- 
theoretic view, the total amount of informa- 
tion gained in making the two-six and six-two 
partitions is equal. Furthermore, making 
these asymmetrical partitions involves less in- 
formation than making the four-four split, 
since, for a given number of assignments into 
two categories, a partition into two equally 
likely categories gives the greatest total infor- 
mational gain. It would be predicted, there- 
fore, that for any given number of dimensions 
the most difficult concept size will be one that 
requires an equal split of dimensions into 
relevant and irrelevant, and that as the con- 
cept size departs from the equal split, the 
concept would become increasingly easy to 
solve. Thus, the most difficult concept size 
for 6-dimension problems would be 3; for 
10-dimension problems, 5; for 12-dimension 
problems, 6, etc. More generally stated, it 
is asserted here that when the examples 
contain n dimensions, the maximally difficult 
concept size will be /2. 

The V shaped function cannot be accounted 
for by such factors as the number of examples 
required or chance expectations. The four- 
dimension concepts were midway between the 
two- and six-dimension concepts in the num- 
ber of examples required. They are, however, 
the most difficult for all four experimental 
groups. With respect to chance expectations, 
the following argument might be made: that 
by chance, the subject is equally likely to se- 
lect two correct dimensions out of eight, or six 


out of eight, 
1/Ġ). 


but less likely to select four out of eight, 


1/&) 


and that it is the inflation of the number of 
correct responses by chance responses that 
generates the V shaped function. This infla- 
tion would amount, at most, to 1/28 x 30, or 
1.07, for the two- and six-dimension concepts 
and 1/70 X 30, or .43, for the four-dimen- 
sion concepts. Correcting the means for 
chance according to this scheme did not affect 
the order of the means at all, and had only a 
slight effect on the differences between the 
means. More elaborate corrections that com- 
bine separate probabilities for first and sec- 
ond trial chance corrections also did not affect 
the order of the means. 

An explanation based on chance expecta- 
tions, along the lines of the one suggested 
above, is further ruled out because the cor- 
rection factors could have a marked effect on 
the curvilinearity of the V shaped function 
only if the subject never made errors in the 
storage or specification of dimension values. 
As was noted in the Results section, a fre- 
quent type of error was that of specification 
of values on the dimensions. 

The superior performance of the majority 
of the experimental groups on the two-dimen- 
sion concepts, compared with the six-dimen- 
sion concepts, requires separate consideration. 
The explanation may be in the greater impor- 
tance of storage of dimension values in the 
six-dimension concepts, 

The finding that simple-negative problems 
(i.e., a positive example followed by negative 
examples) are more difficult, in agreement 
with the findings of Hovland and Weiss, has 
several possible explanations. Hovland and 
Weiss suggested that series of positive exam- 
ples are easier to solve because the positive 
examples each hold the concept in an organ- 
ized form before the subject’s eyes. If sub- 
jects are viewed as carrying out storage and 
selection of dimensions, then another explana- 
tion is possible. In a positive series, the sub- 
jects store the initial example and reduce 
the total amount that they must store as 
they make successive eliminations of irrele- 
vant dimensions. In a negative series, the 
subjects must hold every value in the initial 


PARAMETRIC STUDY or CONCEPT FORMATION 13 


example as relevant or possibly relevant until 
the selection procedure is completed because 
the examples do not permit the elimination of 
irrelevant dimensions; they only permit the 
selection of relevant dimensions. The storage 
load imposed on the subjects, therefore, may 
be viewed as dwindling with positive series. 
This view agrees with the finding that sub- 
jects solving negative series showed more 
storage errors, i.e., errors in specification of 
dimension value. 

In summary, this study has demonstrated 
the effect of two factors that determine the 
efficiency of solution of concept problems: 
(a) example sign (whether the examples are 
positive or negative), and (b) concept size 
(the number or proportion of relevant dimen- 
sions), The findings on the effect of example 
sign replicate and generalize the findings of 
Hovland and Weiss (1953), who used mark- 
edly different types of examples. The findings 
on the effect of concept size support the view 
that systematic processing carried out by the 
subject consists of: 


1. Value specification—this is primarily a 
simple storage function. 


2. Dimension selection—this involves the 
partitioning of the dimensions into relevant 
and irrelevant categories. The V shaped 
function relating concept size to efficiency 
indicates that a critical variable in determin- 
ing the efficiency of solving concepts is the 
information load on the dimension-selection 
function. When the information load on 
the dimension selection is high, concept work 
is handicapped. On _ information-theoretic 
grounds, the load on the selection function 
would be highest when the problem requires 
an equal partitioning of the dimensions into 
relevant and irrelevant. 


It is now also possible to explain the 
marked differences that Hovland and Weiss 
found between the all-negative and mixed 
(referred to in this experiment as simple- 
negative) series. The all-negative series re- 
quires that the subject carry out the systema- 
tic operation of hypothesis enumeration and 
elimination. This is evidently impossible for 
the subject, even though Hovland and Weiss 
used simple three- and four-dimension exam- 


ple series that required only 12, 24, or 27 
initial hypotheses. (The subjects did, how- 
ever, do somewhat better than chance.) The 
simple-negative, or mixed, series permitted 
the subject to use the systematic operation of 
value specification and dimension selection. 


EXPERIMENT III: NuMBER oF DIMENSIONS, 
AND INFORMATION ORDER 


The simple-negative problems in Experi- 
ment II consisted of an initial positive exam- 
ple followed by a sufficient number of negative 
examples to define the concept (see Figure 5). 
The positive example can, however, be placed 
in any other position in the example series 
and, on a logical basis, define the concept 
equally well. The hypothesized processing 
carried out by the subject should be affected 
by the position of the positive example which 
is necessary to define the values of the rele- 
vant dimensions. If the subject is carrying out 
a process of dimension value storage and di- 
mension selection and the positive example is 
presented to him first, he must specify di- 
mension values first and then select dimen- 
sions. If the positive example is presented 
last, he must complete dimension selection and 
then specify dimension values, If, however, 
the positive example is placed in the middle 
of the example series, the subject must start 
dimension selection, shift to storage or speci- 
fication of dimension values, and then return 
to the selection of dimensions. Hovland and 
Weiss found no difference between initial and 
final placement of the positive example. It 
might be expected, however, that medial 
placement of the positive example would 
yield the poorest performance, since this 
placement requires a double shift in the 
hypothesized processing. The purpose of this 
experiment was to explore further the effects 
of position of the positive example in such 
mixed example series. 


Method 


Equipment and procedure. The equipment and 
procedure were the same as that used in Experi- 
ment II. 

Subjects. The subjects were 11 Army enlisted men 
drawn from the same population as that used for the 
preceding experiments. They were average or above 
average on Army intelligence tests. 


14 GLANZER, HUTTENLOCHER, AND CLARK 


Problems. The 45 problems presented to the simple- 
negative group of Experiment II were used, with the 
following changes. In one third of the problems, the 
initial positive example was moved to the medial 
position in the example series; in another third of the 
problems, the initial positive example was moved to 
the final position. The assignment of problems to the 
three information order conditions (initial, medial, 
and final) was done at random, with the restriction 
that an equal number of the two-, four-, and six- 
dimension problems were assigned to each order. The 
pattern of feedback for the medial position order 
was —-+-— for the two-dimension problems, 
—— + —— for the four-dimension problems, and 
—— — + —-—-— for the six-dimension problems. 

In order to make sure that the three sets of prob- 
lems, assigned to each of the three experimental con- 
ditions here, were equal in difficulty, the following 
check was made. The data of the simple-negative 
group of Experiment II were divided into three sets 
corresponding to the sets of problems assigned to the 
three information orders in Experiment III, and 
their means compared by analysis of variance. The 
obtained F of 1.046 (df = 2/80) was not significant. 


Results 


Table 2 summarizes the analysis of vari- 
ance of the factors of information order and 
concept size. Both variables are significant: 
order, at the .05 level, and concept size, at 
the .001 level. Reduction of the degrees of 
freedom, in line with the Greenhouse-Geisser 
recommendation, leaves these variables signifi- 
cant at the .10 and .01 levels, respectively. 
Since the conservative test left the signifi- 
cance of the information order effect in doubt, 
the variance-covariance matrix for the data 
was tested to determine whether it met the as- 
sumption of equal variances and covariances. 
Box’ likelihood ratio test was applied and was 
found to give a chi square of 17.02 with 43 
degrees of freedom. The probability of this 


TABLE 2 


ANALysIs OF VARIANCE: EXPERIMENT III 


Source df MS F 
Between subjects 10 
Within subjects 88 
Information order (I) 2 6.344 3.444* 
Concept size (C) 2 16.040 8.708** 
IxC 4 1238 = 
Error (w) 80 1.842 


*p < 05. 
** 9 < 001. 


4 


6D 


MEAN NUMBER CORRECT 
N 


Initial Medial Final 
INFORMATION ORDER 
Fic. 10. Experiment III: Mean number of correct 


answers as a function of information order (position 
of the positive example) and concept size. 


result on the assumption of equal variances 
and covariances is greater than .95, It was 
therefore decided that the assumption was met 
and that the use of the larger number of 
degrees of freedom for the F tests was justi- 
fied. The effect of information order may 
therefore be considered significant at the 
.05 level. 

The ordering of both main effects is highly 
consistent. The means for the three concept 
sizes are ordered as in the preceding experi- 
ments, with two-dimension concepts solved 
most frequently, six-dimension concepts next, 
and four-dimension concepts solved least fre- 
quently (see Figure 9, the curve labeled 
“TIT’). Problems with the positive example 
placed at either the beginning or end of the 
series are consistently solved more often than 
problems with medial placement. The means 
for the three information orders at each of 
the three concept size levels are shown in 
Figure 10. Pairs of means from the three 
information orders were compared with ¢ 
tests. The difference between the initial and 
the medial order is significant (p < .05). The 
difference between the final and the medial 
order does not attain statistical significance 
(.10 > p > .05). These ź tests were based 
on 10 degrees of freedom. 


PARAMETRIC STUDY OF CONCEPT FORMATION 15 


Since the group run here in Experiment 
III had 15 problems that were in the initial 
order, it was possible to compare performance 
on these problems with the corresponding 15 
problems in the simple-negative group of 
Experiment II. The mean for the 15 matched 
problems from the simple-negative group was 
10,7; for the group of this experiment, it was 
6.1. This difference, however, was not signifi- 
cant (p > .10). 


Discussion 


The results further support the view that 
the subject’s performance on this type of 
problem may be analyzed into two distinct 
functions: dimension selection and dimen- 
sion-value specification. When these two func- 
tions are segregated, with either the dimension 
selection or value specification completed first, 
the subjects perform equally well. This was 
demonstrated by Hovland and Weiss and was 
replicated here. When, however, the subjects 
are required to start one, shift to the other, 
and then complete the first, they are handi- 
capped, This handicap might be explained 
as a result of the additional load imposed on 
the subject by the shift. It might also be 
explained as a result of interference between 
the functions when the presentation does not 
permit their separation. In either case, the 
data can be handled by an approach that 
considers the subjects as carrying out a sys- 
tematic operation that consists of two stages: 
dimension selection and value specification. 


EXPERIMENT IV: Concept SizE—TrEst oF 
THE HALVING GENERALIZATION 


In Experiments II and III it was found that 
with examples containing eight dimensions, 
concepts with four relevant dimensions were 
more difficult to solve than concepts with 
either two or six relevant dimensions. This 
finding held for problem types consisting of 
both positive and negative example series. It 
held for both simple and complex example 
series. The results were explained as a func- 
tion of the information load on the dimension- 
selection function and the following generali- 
zation was made: when the examples contain 
n dimensions, the concept size n/2 will be of 


maximum difficulty. Two questions arise con- 
cerning this generalization: 


1. Does it apply to concept problems con- 
structed of different types of examples than 
the special form of examples used in the 
preceding experiments? The finding may be a 
function of the special, distributed, type of ex- 
ample used, The procedures and type of ex- 
ample used in the preceding experiments have 
been similar enough in their effects to replicate 
findings of Hovland and Weiss, who used 
different procedures and more “conventional” 
concept examples. (The replicated findings 
were those on the presence of differences be- 
tween simple-positive [all positive] and sim- 
ple-negative [mixed] problem series [Experi- 
ment II] and the absence of difference 
between initial and final placement of the 
positive example in a simple-negative problem 
series [Experiment III].) The curvilinear 
effect of concept size is, however, striking 
enough to warrant further study in order to 
rule out the explanations based on the special 
form of the examples used in the preceding 
experiments. 


2. Does it apply to concept problems in- 
volving numbers of dimensions other than 
those used in Experiments II and III? 


For the following experiments, concept 
problems were constructed, using examples 
like the Hanfmann-Kasanin blocks. The ex- 
periment was performed twice. 


Method 


Subjects. The subjects were soldiers who were 
average or above on Army intelligence tests. The first 
replication consisted of 22 subjects; the second repli- 
cation consisted of 23 subjects. 

Problems, The problems were simple-positive series, 
as in Experiment II, The examples were pictures of 
16 different blocks that varied in the following 
dimensions: height (tall or short), width (wide or 
narrow), shape (cylindrical or rectangular; identified 
to the subjects as round or square), and color (blue 
or yellow). Sixty concept problems were constructed, 
equally divided into problems with one-, two-, and 
three-dimension solutions. The problems were con- 
structed so that each dimension appeared as relevant 
equally often within each set of problems of a given 
concept size. As in Experiment II, each successive 
example in a problem changed in one respect from 
the initial example. Thus, a problem with a one- 
dimension solution consisted of four examples, a 


16 GLANZER, HUTTENLOCHER, AND CLARK 


problem with a two-dimension solution consisted of 
three examples, and one with a three-dimension solu- 
tion consisted of two examples. It has already been 
demonstrated in Experiment II, by comparison of 
the positive and negative series, that the number of 
examples used to define the concept has no effect on 
the V shaped function obtained there (see Figure 9). 

Procedure. The procedure was the same as that 
used in Experiment II, with each problem being 
shown twice. Each example was shown for 2 seconds. 
(Preliminary testing indicated that this was sufficient 
time for the subjects to view and respond to each 
example.) The subjects indicated their answers by 
circling items in lists of dimension values that were 
printed in a booklet. The subjects had approximately 
10 seconds to write their answers. 


Results 


Each subject received three scores: the 
number of correct responses on the one-, two-, 
and three-dimension problems, An error in 
presentation of two of the problems in the 
first replication made it impossible to use the 
results from those two problems. The results 
for the first replication, weighted to make 
the numbers of problems comparable, are 
shown in Figure 11. Analysis of variance (see 
Table 3) of the scores, taking account of the 
different number of items in each category, 
shows a significant effect of concept size 


GROUP A 


GROUP B 


MEAN NUMBER CORRECT 


1 2 ; 3 
CONCEPT SIZE 
Fic. 11. Experiment IV: Mean number of correct 
answers as a function of concept size when blocks are 
used as examples. 


4 


(p < .001), with the middle-concept size 
(two relevant dimensions) most difficult. 
Reduction of the degrees of freedom to 1 and 
21 leaves the effect significant at the .005 
level. In the second replication the same re- 
sults were obtained. The analysis of variance 
is presented in Table 3 and the mean scores 
on each concept size are also plotted in Fig- 
ure 11, The effect of concept size is significant 
at the .001 level, with reduction of the de- 
grees of freedom leaving the significance 
level unchanged. 


Summary 


The results verify the generalization made 
on the basis of Experiment II. Using very 
different types of concept examples from 
those used in Experiment II, and with a 
different number of example dimensions, the 
generalization holds. The concept size that 
halves the number of example dimensions is 
the most difficult. 


EXPERIMENT V: STORAGE LOAD AND 
SELECTION LoAD 


In the preceding experiments the system- 
atic operations of the subject on the con- 
cept problems were viewed as consisting of 
two distinct functions: storage of dimension 
values and selection of dimensions. The max- 
imal difficulty of concept sizes that halve the 
number of. example dimensions was inter- 
preted as the effect of information load on 
the selection function. The purpose of this 


TABLE 3 


ANALYSES OF VARIANCE; EXPERIMENT IV 


Source df MS F 
Replication 1 
Between subjects 21 
Within subjects 44 
Concept size 2 202,032 13.445* 
Error (w) 42 15,026 
Replication 2 
Between subjects 22 
Within subjects 46 
Concept size 2 177.928 32.877* 


Error (w) 44 5.412 


*p < 001. 


PARAMETRIC STUDY OF CONCEPT FORMATION 17 


S@Daxsva 
OONOXS0A 


Fic. 12. Two positive eight-dimension examples 
defining a four-dimension concept, “black circle, 
black X, white club, black triangle.” 


experiment was to demonstrate the effects of 
both storage load and selection load, and 


_ furthermore, to demonstrate the effects of 


selectioh load by a technique other than that 
of varying the concept size, the proportion of 
relevant dimensions. Developing operations 
to manipulate both storage and selection load 
involved three main steps: (a) preliminary 
measurement of the perceptual difficulty of 
the concept examples used in the experimental 
work; (b) development of an index of stor- 
age load, derived from the perceptual meas- 
ures; (c) development of an index of selection 
load, derived from the perceptual measures. 


Preliminary Measurement of Perceptual Dif- 
ficulty 


The examples used in the concept problems 
were arrays of eight black and white figures 
in fixed order (see Figure 12). In a separate 
experiment (Glanzer & Clark, 1963), the 
subjects viewed each example for .5 second, 
and recorded immediately what they saw. 
The proportion of correct identifications for 
each example was used as an index of percep- 
tual difficulty. A perceptual index of .850 
meant that 85% of the subjects in the calibra- 
tion experiment reported the array correctly 
on the basis of a .5-second exposure. Thus, 
the higher the index, the lower the perceptual 
difficulty of the array. 


Development of an Index of Storage Load 


The index of storage load was constructed 
on the basis of the following assumption: an 


array of values that is difficult to perceive | 


under short exposure will be difficult to store 
even when exposure time is ample. In solv- 
ing the concept problems, it is assumed that 
the subject stores the dimension values by 


storing the values of the first positive example 
he sees. The index of storage load for a 
problem was therefore identified with the 
perceptual difficulty of the first positive ex- 
ample in the series of examples that make up 
the problem. 

The concepts of perceptual difficulty and 
storage load are operationally distinct. ‘“Per- 
ceptual difficulty” refers, to a situation in 
which the subject sees the example for a very 
brief period (.5 second) and immediately re- 
cords what he sees, without any intervening 
task. “Storage load” refers to a situation in 
which the subject sees each example for a 
relatively long period of time (10 seconds) 
and reports part of the information after an 
intervening period (during which subsequent 
examples are shown), 


Development of an Index of Selection Load 


The index of selection load was also derived 
by making use of the perceptual data. The 
basis for the derivation is displayed in Figures 
12 and 13. Figure 12 shows a problem con- 
sisting of a pair of eight-dimension examples. 
If the subject is told that both of the examples 
in the problem are positive, i.e., are members 
of the concept class, and that there are four 
relevant dimensions that define the concept 
class, the answer is the one given below the 
problem. The values that stay constant in 
the two positive examples define the relevant 
dimensions and those that change define the 
irrelevant dimensions. 

A schematization of the same problem is 
given in Figure 13, Here the first line iden- 
tifies the initial values that the subject has to 
store to solve the problem. On the basis of 
the changes in the subsequent example, he 


som OOMARHOA 
som &OLARAOA 


w OOHOKSVA 


Fic. 13. Schematization of the problem in Figure 12 
in terms of storage and selection vectors. 


18 GLANZER, HUTTENLOCHER, AND CLARK 


sorts the dimensions into relevant (repre- 
sented by a 1) and irrelevant (represented by 
a 0). The subject solving the problem may 
be viewed as combining two vectors (the 
storage vector and the selection vector) to 
give the answer in the line below them. 

There is an obvious formal similarity be- 
tween the storage vector, which describes the 
values of the initial example, and the selection 
vector. Both consist of an array of items that 
take one of two values. In the case of the 
storage vector, the values are black or white. 
In the case of the selection vector, the values 
are relevant (represented by a 1) and irrele- 
vant (represented by a 0). The assumption 
was therefore made that the perceptual diff- 
culty of an array of black and white figures 
could be used as an index for the difficulty in 
handling the corresponding selection vector. 
Thus, the perceptual difficulty’? of the array 


10 Either B B W B W W B W or its complement 
WW BWB BWB could be used as a basis for 
deriving an index for the 1 1 0 1 O O 1 O selection 
vector, Fortunately, however, the perceptual diffi- 


LOW STORAGE LOAD 
(BWBWWWWW) 


Example 


Low 
SELECTION 
LOAD 


(11100001) 
Answer 


Example 


HIGH 
SELECTION 
LOAD 


(01001101) 


Answer 


E 20] _TAESTAIVINIKO JET E a 7N 
RAOL L 2L A ZNE JEA 1V Y 


ERE DOMSA 


COMOKAIA 'COLOXSVA 
KAONE IXILA ZN 


COLEKSVA| O©OOKSVA 


B B WB W W BW could be used as an 
index for the difficulty of the selection vector 
11010010. 

In order to manipulate the load on the stor- 
age function, high and low load initial exam- 
ples were used. In order to manipulate the 
load on the dimension-selection function, high 
and low load selection vectors were chosen, 
making use of the correspondence between 
selection vectors and the perceptual arrays. 


Method 


Subjects. A group of 30 Army enlisted men served 
as subjects. They were average or above average on 
Army intelligence tests. 

Problems. The problems were of the same general 
type as those used in the preceding experiments. The 
examples were arrays of black and white figures, as 
before. Each problem, however, consisted of only 
two positive examples. There were 80 conjunctive 
concept problems, equally divided into the following 


culty of an array and its complement showed high 
correlation. A high selection load vector was there- 
fore identified with a pair of complementary arrays 
in which both members had a high perceptual load. 


HIGH STORAGE LOAD 
(WBWBBWBB) 


Example 


Answer 


Example 


OORBOXOVA 


Answer 


Fic. 14. Construction of the four types of problem by combination of storage and selection vectors. 


EE 


PARAMETRIC STUDY oF CONCEPT FORMATION 19 


categories, according to the load imposed on storage 
and selection: 


LL: low storage load, low selection load 

LH: low storage load, high selection load 
HL: high storage load, low selection load 
HH: high storage load, high selection load 


Each problem consisted of a pair of examples so 
matched that they differed in four of the figures or 
dimensions. The examples were presented in pairs as 
positive instances of the concept. On the basis of the 
similarities and dissimilarities in the two slides, the 
subjects were required to select the relevant, and to 
eliminate the irrelevant, dimensions. An example of 
each type of problem is displayed in Figure 14. 

The 80 problems were constructed in the following 
way. Twenty low storage load examples (perceptual 
indices between .850 and .980) and 20 high storage 
load examples (perceptual indices between .075 and 
.220) were selected. Ten low selection load (indices 
between .480 and .850) and 10 high selection load vec- 
tors (indices between .150 and .280) were selected. 
(Each set of 10 consisted of five pairs of vectors, i.e., 
a vector and its complement—for example, 1 1 0 1 
0001and00101110.) Each of the 10 low selec- 
tion load vectors were combined with two different 
low storage load initial slides and with two different 
high storage load initial slides. The same was done 
with the 10 high selection load vectors. 

The method of combining storage and selection 
vectors to construct a problem is shown in Figure 14. 
The mean perceptual difficulty of the second example 
was matched for the four experimental groups of 
problems. The order in which the problems were 
presented to the subjects was randomized. 

Procedure. The general procedure and the equip- 
ment was the same as that used in the preceding 
experiments. The subjects solved 80 problems, after 
being told that the problems would consist of only 
positive examples and that all problems would have 
four-dimension solutions. Each problem was given 
twice in succession; each example ‘was shown for 
10 seconds. The 80 problems were given in five ses- 


* sions. Each session consisted of 16 problems and 


took approximately 35 minutes. All sessions were 
separated by 10-minute rest periods, except the fourth 
and fifth, which were separated by a 90-minute 
interval. 


Results 


Both of the main experimental variables 
(storage load and selection load) were found 
significant (p < .001). The results are sum- 
marized in Figure 15. The interaction of stor- 
age and selection load is not significant (see 
Table 4). In order to determine whether the 
effects obtained in both the first and second 
trial on each type of problem, separate analy- 


TABLE 4 


ANALYsIS OF VARIANCE: EXPERIMENT V 


Source df MS F 
Between subjects 29 
Within subjects 90 
Storage load 1 3010.008 89.736* 
Selection load 1 200.208 .15.585* 
Interaction 1 7.008 1.118 
Error (1) 29 33.543 
Error (2) 29 12.846 
Error (3) 29 6.267 
*p< 001. 


ses were carried out on first trial and second 
trial data. The results for each were the same 
as for the combined data. Storage and selec- 
tion load were significant; their interaction 
was not. The only marked change was that, 
in the second trial data, the significance of 


40 
35 
LL 

30 
5 LH 
W 
pA 
Œ 
8 

25 
5 HL 
o 
= 
> 
= 20 
Ea HH 
w 
= 

15 

10 

Low High 


STORAGE LOAD 


Fic. 15. Experiment V: Mean number of correct 
answers as a function of storage and selection load. 
(The first letter refers to storage load and the second, 
to selection load: LL—low storage, low selection 
load; LH—low storage, high selection load, etc.) 


20 4 GLANZER, HUTTENLOCHER, AND CLARK 


selection load dropped from the .001 to the 
.O1 level. 

Learning curves for the four problem types 
are presented in Figure 16. There is a marked 
improvement in all four problem types over 
the successive sets of problems of each type. 
The same ordering of experimental conditions 
is found over the four sets of trials, except for 
a reversal between LL and LH in the first set. 


Discussion 


The findings show that increase of load on 
either the storage function or the selection 
function decreases the efficiency of the con- 
cept work. These two factors do not interact. 
The changes in performance that can be at- 
tributed to learning are regular. The effect of 
storage load is considerably greater than that 
of selection load. This is probably due, how- 
ever, to the fact that the mean perceptual diffi- 
culty of the stimuli used to define the high and 
low storage vectors differed more than the 
mean perceptual difficulty of the stimuli used 
to define the high and low selection vectors. 
This difference arose from the restriction of 


9 ut 


HL 


HH 


MEAN NUMBER CORRECT 
a 


2 3 
SETS OF PROBLEMS 


Fic. 16. Experiment V: Learning curves for the 
four sets of problems. (Each point represents five 
problems.) 


selection vectors to those with four relevant 
and four irrelevant dimensions, since all the 
problems had four-dimension solutions. Only 
the stimuli with four black and four white 
figures could, therefore, be used in construct- 
ing selection vectors for these problems. The 
load on storage and selection that was varied 
in this experiment can, of course, be referred 
to as information load. It is important, how- 
ever, to emphasize the role of encoding in de- 
termining information load. Examination of 
the data on the perceptual difficulty of the 
examples reveals that difficulty is closely re- 
lated to the simplicity of the code or verbal 
description that can be used for each ex- 
ample. Thus, the array B B B B B W W W, 
which can be encoded simply, is much easier 
to report accurately than the array B B W B 
WBBW. 

It is particularly important to emphasize 
the role of encoding when generalizing from 
this study. In studies with other types of con- 
cept examples, it is possible to proceed as was 
done here: carry out preliminary measure- 
ment on the examples and derive estimates of 
storage and selection on the basis of the meas- 
ures obtained. By attending to the role of 
encoding, however, it is possible to short-cut 
or simplify the procedure, The experimenter, 
for example, can make use of previously 
learned codes. It is probable that data on 
height and width of geometrical objects are 
grouped to permit simple encoding. Thus, an 
object that is both tall and wide can be coded 
as large; an object that is both short and nar- 
row can be coded as small. If height and , 
width required separate processing by the 
subject, simple coding would be prevented, 
It would be expected that concept problems 
with blocks such as those used in Experiment 
IV could be constructed so as to prevent sim- 
ple coding in either storage or selection. Thus, 
storage load might be expected to be high and 
the problem more difficult if the initial posi- 


* 


11 This point is developed fully in the paper by 
Glanzer and Clark (1963). Subsequent studies by 
Glanzer and Clark have demonstrated that the rela- 
tion of difficulty to the simplicity of the code or 
verbal description is a general one. The relation holds 
equally well for sets of conventional designs like 
those used by Smoke (1932). 


PARAMETRIC STUDY OF 


tive example were short (small in height), 
but wide (large in width). Selection load 
might be expected to be high and the problem 
more difficult if the selection vector made 
height relevant, but width irrelevant. Closely 
related to a procedure that makes use of 
previously learned codes is one that makes 
use of word associations or semantic relations 
between example dimensions and their values. 
The data on the relation between words gen- 
erated in studies employing the semantic dif- 
ferential technique could be readily adapted to 
construct problems in which storage and selec- 
tion load are varied experimentally. 

There are, then, at least three ways in which 
the experimenter can obtain load estimates 
that will give him a basis for manipulating 
load on the storage and selection functions: 
(a) by estimates based on measured charac- 
teristics of the concept examples (this was 
done in the present experiment), (b) on the 
basis of assumptions about the subject’s en- 
coding, (c) on the basis of word associations 
or semantic relations. The experimenter can 
also manipulate the load on storage and selec- 
tion without obtaining load measures, by giv- 
ing the subjects preliminary training with the 
experimental materials. By exposing the sub- 
ject beforehand to particular examples, the en- 
coding of these examples should be made 
more efficient. Storage load in problems that 
include these examples should be decreased. 
By exposing the subject beforehand to series 
of examples in which varying correlations are 
maintained between dimensions, it should be 
possible to control the difficulty of various 
selection vectors. 

In summary, a method was developed for 
manipulating both load on the storage and se- 
lection functions in solving concept problems. 
Both factors have been demonstrated to have 
an effect on the efficiency of the operations 
carried out by the subjects. In doing this, 
“cognitive strain” has been measured and ex- 
perimentally varied. Methods for manipulat- 
ing storage and selection load with other types 
of experimental material were also considered. 


EXPERIMENT VI; INFORMATION RATE 


In the preceding experiments, the examples __ 


in a series changed at either, a minimal Tate, 


i = A 


Concept FORMATION 21 


permitting the subject to eliminate one irrele- 


- vant dimension per example (Experiments 


I-IV) or they changed at a maximal rate per- 
mitting the subject to eliminate all the irrel- 
evant dimensions with two examples (Experi- 
ment V). 


In the terminology of the preceding experi- 
ment, the initial positive example generates a 
storage vector which contains information on 
the values of the possibly relevant dimensions. 
The subsequent examples—in particular their 
similarities to, and differences from, the initial 
positive example—generate a selection vector 
which contains information on the relevance or 
irrelevance of each dimension. The solution is 
obtained by combining the storage and selec- 
tion vectors. The same selection vector can be 
generated by few or many subsequent exam- 
ples. With few examples, the amount of in- 
formation carried by each example is high. 
With many examples, the amount of informa- 
tion carried by each example is low. The pur- 
pose of Experiment VI was to vary the rate of 
information presented by the example series 
in order to determine the effect of information 
rate on the efficiency of the subject’s per- 
formance and the interaction of information 
rate with the variables of storage and selection 
load. Two experiments were carried out using 
two different types of example series to present 
problem information to the subjects. 


a: Origin-Based Series 
Method 


The general procedure (instructions, practice, re- 
cording of answers, etc.) and equipment were the 
same as that for the preceding experiment. 

Subjects. A group of 30 Army enlisted men served 
as subjects, They were average or above average on 
Army intelligence tests, 

Problems. Seventy-two four-dimension concept 
problems were used, adapted from the problems used 
in Experiment V. As in Experiment V, there were 
four classes of problems, divided on the basis of the 
load each imposed on storage and selection (LL—low 
storage load, low selection load; LH—low , storage 
load, high selection load; HL—high storage load, low 
selection load; HH—high storage load, high selection 
load). Each of these classes of problems, however, 
was further subdivided into problems \given at one of 
the following-information rates: cD, 2, 4. With In- 
formation Rate 1, each example in a ptoblem differed 
in one dimension from the initial example. (When a 


-| 


22 GLANZER, HuTTENLOCHER, AND CLARK 


dimension changes value in a series of positive exam- 
ples, the subject is being presented with information 
that the dimension is irrelevant.) With Information 
Rate 4, all irrelevant dimensions changed. There were 
24 problems at each information rate, distributed 
equally across the four combinations of storage and 
selection load. Figure 17 shows the same problem 
with information rates of 1, 2, and 4. 

Procedure. The 72 problems were given in four 
sessions. Each session consisted of 18 problems and 
took approximately 40 minutes. The sessions were 
separated by 10-minute rest periods. As in the pre- 
ceding experiments, each problem was given twice. 
The subjects were told that the problems would con- 
sist of only positive examples and that all problems 
would have four-dimension solutions. 


RATE | 


n RRR 


DS’: DDD’ 


000 800080 
RXR 


e Ie] 


@ @ 
C] 
> 
x 
(2 
< 
> 


RATE 4 


> pbb toto 
€ CSCE CSSSS 
> >>> >>> 


ce <eo 9O99 
> 
xe 


Fic, 17. Origin-based problem with information 
rate of 1, 2, and 4. (All examples are positive 
instances. The answer is the same for all three 
versions: “black circle, white square, white club, and 
black triangle.’”’) 


"n 


tt 


7 LH 


HL 


HH 


MEAN NUMBER CORRECT 


I 2 4 
RATE 


Fic. 18, Experiment VIa. (Origin-based prob- 
lems): Mean number of correct answers in conditions 
LL, LH, HL, and HH, as a function of information 
rate. 


TABLE 5 


ANALYSIS OF VARIANCE: EXPERIMENT VIa 


Source dj MS F 
Between subjects 29 
Within subjects 330 
Storage load (St) 1 520.803 96.858** 
Selection load (Se) 1 128.403 37.479** 
Rate (R) 2 17.811 - 5.379¢ 
St X Se 1 803 — 
StXR 2 55.678 27.213** 
SeX R 2 26.978 8.594** 
StXSeXR 2 14011 5.091* 
Error (1) 29 5.377 
Error (2) 29 3.426 
Error (3) p gaat 
Error (4) 29 2.734 
Error (5) 58 2.046 
Error (6) 58 3.139 
Error (7) 58 2.752 
p< 01 
**p < 001 


PARAMETRIC STUDY ofr CONCEPT FORMATION $ 23 


Results 


The results are summarized in Figure 18. 
As in Experiment V, both the effect of stor- 
age load and selection load are significant 
(p < .001) with no interaction between the 
two (see Table 5). The effect of information 
rate, the main new variable of this experi- 
ment, is also significant (p < .01). There is 
a significant (p < .001) overall decline in 
number of solutions as the information rate 
increases from 1 to 2. As the rate increases 
from 2 to 4, however, the overall decline does 
not continue. There is, instead, a slight, non- 
significant increase. (The increase between 
Rates 2 and 4 is found in the LL, LH, and HL 
problems. Only, however, in the LL problems 
is the increase significant—p < .05.) The LL 
problems, contrary to the other problems, 
show a monotonic, statistically significant 
(p < .001) increase in number of solutions 
as information rate increases. There is a sig- 
nificant second order interaction: storage load 
by selection load by rate (p < .01). Use of 
conservative tests with reduced degrees of 
freedom (Greenhouse & Geisser, 1959) does 
not affect the findings summarized in Table 5, 
except to reduce the effect of rate to the .05 
level, selection load by rate interaction to the 
.005 level, and storage load by selection load 
by rate interaction to the .05 level. Before 
considering these results further, a second, 
twin experiment will be considered. The sec- 
ond experiment differs from the first only in a 
characteristic of the example series used. 


b: Predecessor-Based Series 


In Experiment VIa, each successive example 
differed in one, two, or four dimensions from 
the initial example, depending on the informa- 
tion rate used in that problem. This type of 
example series, in which each example differs 
with respect to the first example in a given 
number of dimensions, was labeled origin 
based. There is another way in which the 
same information can be presented to the 
subjects with similar variations in rate: by 
having each successive example change with 
respect to the immediately preceding example 
in the series, rather than with respect to the 
initial example. This type of example series 


was labeled predecessor based. In Figure 17, 
the problems are all origin based. In Figure 
19, the same problems are shown with a prede- 
cessor-based procedure. Origin-based example 
series were used in Experiments II-IV ; prede- 
cessor-based, in Experiment I. Examination 
of Figures 17 and 19 shows that: 


RATE I 


OLAX OVA 


x X Xx Xx 


D Apxptp 


Od 
SGE SSe 
>rPr >>> 


x Xx 


bie E E 
DP > 


> 
xX 
iP 
< 
> 


4@ 
4@ 
x 
O 
+0 
+0 
O 
+0 
0®@ 


[J 
> 
x 
£o 
SS 
> 


Fic. 19. Predecessor-based problem with informa- 
tion rate of 1, 2, and 4, (All examples are positive 
instances. The answer is the same for all three 
versions. Compare with Figure 17.) 


1. At an information rate of 1, the prede- 
cessor-based example each differ in one dimen- 
sion from the preceding example. The origin- 
based examples show the following number of 
such changes: one, two, two, two. 

2. At an information rate of 2, the predeces- 


sor-based examples each differ in two dimen- 
sions from the preceding example. The origin- 


24 GLANZER, HUTTENLOCHER, AND CLARK 


based examples show the following number of 
such changes: two, four. 


3. At an information rate of 4, the predeces- 
sor-based and origin-based series are identical 
for these problems. 


From the point of view of changes from 
example to example, the origin-based series 
appears, in general, somewhat more complex, 
ie. on the average there are more changes 
from example to example. 

The purpose of this experiment was to re- 
peat the variations used in Experiment VIa 
with predecessor based example series, in order 
to examine the effect of the change on the 
relations found in Experiment VIa. 


Method 


Procedure. The procedure was identical with that 
used in Experiment VIa. 

Subjects. The subjects were 37 Army enlisted men 
drawn from the same unit that furnished the subjects 
for Experiment VIa. They were average or above 
average on Army intelligence tests. Their mean in- 
telligence test scores did not differ significantly from 
the mean for the subjects of Experiment VIa. 


Results 


The analysis of variance (see Table 6) of 
the number of correct responses gives results 
similar to those found in Experiment VIa. The 


TABLE 6 


ANALYSIS OF VARIANCE; EXPERIMENT VIb 


Source df MS F 
Between subjects 36 
Within subjects 407 
Storage load (St) 1 876.973 177.489* 
Selection load (Se) 1 111.000 43.207* 
Rate (R) 2 84.766 31.465* 
St X Se 1 441 — 
St XR 2 31.973 14.017* 
SeX R 2 31459 17.088* 
StXSeXR 2 4.225 1.626 
Error (1) 36 4.941 
Error (2) 36 2.569 
Error (3) 72 2.694 
Error (4) 36 1.807 
Error (5) 72 2.281 
Error (6) 72 1.841 
Error (7) 72 2.598 
*p < 001. 


5 H 
W 
g 
g HL 
c 
8 
= 
2 HH 
z 
< 
uw 
= 


i 2 4 
RATE 


Fic. 20. Experiment VIb (Predecessor-based prob- 
lems): Mean number of correct answers in condi- 
tions LL, LH, HL, and HH, as a function of informa- 
tion rate. 


main effects of storage load (p < .001), se- 
lection load (p < .001), and rate (p < .001) 
are all significant. The interaction of storage 
load and rate, and selection load and rate are 
again significant (p < .001), while the inter- 
action of storage and selection load is not. 
There is, however, no significant second order 
interaction here of storage X selection X rate. 

The means for the various conditions are 
summarized in Figure 20. The general pattern 
of the results is simpler than that found in Ex- 
periment VIa, although basically similar. The 
basic similarity of the two sets of data is 
brought out by ranking the 12 points in Fig- 
ure 18 and the 12 points in Figure 20. The 
rank-order correlation between the two sets 
of points is .95. The data show a significant 
overall decline in the number of solutions as 
the information rate increases, this decline 
continuing over the range of information rates 
used. 

Information rate, however, interacts with 
both selection load and storage load. When 
either storage load or selection load is high, 


PARAMETRIC STUDY OF CONCEPT FORMATION 25 


the effect of increasing the information rate 
becomes more marked. There is also a tend- 
ency for the low-storage, low-selection load 
problems to give more solutions at the high 
information rates. A test of the differences 
between the three LL means reveals, however, 
that they do not differ significantly. 

The origin-based problems of Experiment 
VIa are generally more difficult than the 
matched predecessor-based problems of Ex- 
periment VIb. The difference between the 
means for Experiments VIa and VIb is sig- 
nificant (¢ = 3.00, df = 65, p < .01). Dif- 
ferences favoring the predecessor-based prob- 
lems are found at each of the information rate 
levels. The largest difference is found at the 
information rate of 1, and the smallest differ- 
ence, as would be expected, at the information 
rate of 4. There may be two reasons why the 
origin-based series, in general, are more diffi- 
cult than the predecessor-based series. 


1. In an origin-based series every example 
after the second introduces more changes than 
the corresponding examples in a predecessor 
based series. There are the changes in which 
a dimension is varied for first time, plus the 
changes in which a previously varied dimen- 
sion is returned to its initial value. 


2. In a predecessor-based series, particu- 
larly at the lower information rates, the sub- 
ject can miss several examples, but can still re- 
cover the information for the selection vector 
by comparing the last example with the stored 
values for the initial example. This type of 
redundant information is not available in the 
origin-based series. 


There is a possibility that the generally 
facilitative effect of the lower information 
rates was due not to the information rate, but 
to the longer period between the first and last 
examples in a series. In order to check this, 
a series of experiments was carried out in 
which comparisons were made between ex- 
ample series with various levels of information 
rate and example series with rest intervals of 
the same length inserted between the first 
and last examples. Although an overall tend- 
ency for the rest intervals to improve per- 
formance could be observed, the effect was 
slight, unreliable, and had complex, irregular 


interactions with storage and selection load. 
The effect of the interposed rest interval, when 
it could be demonstrated, was in any case too 
small to account for the effects of information 
rate obtained here. 

In summary, two experiments were carried 
out to determine the effect of information rate, 
the rate at which irrelevant dimensions are 
eliminated in a series of positive concept ex- 
amples. Both experiments indicate that as 
information rate increases, the probability of 
solving the concept problem decreases. This 
overall effect interacts significantly with the 
storage and selection load imposed by the 
problems. As either of these increases, the 
effect of information rate increases. The state- 
ments above apply most clearly to the prede- 
cessor-based example series used in Experi- 
ment VIb, in which each example changes 
minimally from its predecessor. When the 
somewhat more difficult origin-based example 
series is used, as in Experiment VIa, some ex- 
ceptions to the general statements appear. 


SUMMARY AND DISCUSSION 


The experiments above have explored some 
of the factors that control the efficiency of 
performance on the family of concept prob- 
lems introduced by Smoke, and studied by 
Hovland and other investigators. The effects 
of the following factors were examined. 


1. Example sign—whether the examples are 
positive or negative instances. The findings 
confirmed the finding of Hovland and Weiss 
(1953) that it is easier to solve all-positive 
example series than it is to solve mixed ex- 
ample series. 


2. Concept size—the ratio of the number 
of relevant dimensions to the total number 
of example dimensions. This was a new vari- 
able that has a curvilinear effect on perform- 
ance. The effect was explained in terms of 
information load on one stage of the system- 
atic operations carried out by the subject: the 
selection of relevant dimensions. 


3. Series complexity—the presence of super- 
fluous information. Although there was a tend- 
ency for superfluous information to lower the 
efficiency of performance, no significant effect 
was found. ¥ 


26 Granzer, HuTTENLOCHER, AND CLARK 


4. Information order—the sequence of ex- 
amples in series consisting of mixed positive 
and negative examples. The findings on two of 
the orders tested confirmed earlier, aonsignifi- 
cant findings of Hovland and Weiss. The use 
of an additional experimental condition, how- 
ever, showed that the order of the information 
does have a significant effect. The results 
were related to the systematic operations re- 
quired of the subject. 


5. Storage load—the amount of information 
that has to be stored at the beginning of a 
problem. A technique was devised for predict- 
ing the storage load imposed by a problem 
and the effect of this variable demonstrated. 


6. Selection load—the amount of informa- 
tion required to sort the example dimensions 
into relevant and irrelevant. A technique was 
also devised for predicting the selection load 
imposed by a problem and its effect demon- 
strated, 


7. Information rate—the rate at which new 
information is presented within the example 
series. This variable was shown to interact 
with both storage load and selection load. 


The findings make it possible to control the 
probability of solution of the concept prob- 
lems. They also make it possible to specify 
in considerable detail the systematic operations 
carried out by the subjects in solving the prob- 
lems. The operations consist of two distinct 
stages; (a) specification and storage of dimen- 
sion values, and (b) selection of relevant di- 
mensions on the basis of example information. 


Generality of Findings 


One of the first questions of interest con- 
cerns the generality of the findings. The re- 
sults were obtained with a simplified example 
series in a highly restricted experimental situa- 
tion. There is evidence, however, that the 
regularities found here hold for the entire fam- 
ily of Smoke-Hovland problems, When repli- 
cations were attempted of the work of other 
investigators who used different materials and 
more relaxed experimental conditions, the 
replications were successful. Thus, the Hov- 
land and Weiss findings on positive, versus 
mixed, series, and their findings on the effect 

4 


of placing the positive example at the begin- 
ning, versus the end, of a mixed series were 
confirmed. It was also shown that the curvi- 
linear effect of concept size, found first with 
simplified examples, could be demonstrated 
with more conventional types of examples. 
Finally, the fact that all-negative series of 
examples are nearly impossible to solve when 
given serially (see Cahill & Hovland, 1960; 
Hovland & Weiss, 1953) agrees with the find- 
ings here on the central role of storage and 
selection of dimensions. With all-negative 
example series, the operation of storage and 
selection of dimensions cannot be used to solve 
the concept problems. The operation required 
by all-negative series is storage and selection 
of hypotheses. 

There is another and different question that 
may be asked concerning the generality of the 
results: are they applicable to other families 
of concept problems such as those derived 
from discrimination learning arrangements 
(see Archer et al., 1955; Bourne & Haygood, 
1959). These problems impose a simple stim- 
ulus-response reinforcement structure on the 
concept solving process, with the individual 
examples as stimuli-and the subject’s attempts 
at categorization as responses. Although these 
problems are quite different in structure from 
the one used in the experimental work reported 
here, the findings can be used to clarify the 
parameters in the discrimination learning ar- 
rangements and also can be used as a basis 
for alternative explanation of findings ob- 
tained with those arrangements. The proce- 
dure used in experiments derived from discrim- 
ination learning arrangements is as follows. 
A random series of examples is presented 
successively to the subject, and the subject 
has to make one of several responses to each 
example. He is given information as to the 
correctness of his response on some or all of 
the trials. 

Let it be assumed that the subject is using 
systematic operations like those discussed for 
the Smoke-Hovland problems, and that he at- 
tempts to use the positive instances of one of 
the categories to define the category. The 
following can be argued: 


1. For a given number of relevant dimen- 
sions, as the number of irrelevant dimensions 


PARAMETRIC STUDY oF CONCEPT FORMATION 27 


increases, the number of trials to solution 
(and therefore, also the number of errors) 
should increase. There are two factors in- 
volved in deriving this statement. One factor 
is that of storage load, which would increase 
as the total number of example dimensions 
increases. The other factor is the informa- 
tional structure of a randomly selected series 
of examples. It can be shown that as the 
number of irrelevant dimensions increases 
(and therefore, the total number of different 
examples), there is an increase in the average 
number of randomly selected examples needed 
before the concept is logically defined. Thus, 
if the subject is solving by systematic opera- 
tions, he would give results in line with the 
findings of Bourne (1957) and Bourne and 
Haygood (1959). 


2. By the same reasoning, for a fixed num- 
ber of irrelevant dimensions, as the number of 
relevant dimensions increases, there is an in- 
crease in the number of trials to solution 
(and therefore, the number of errors). This 
again is in line with the data of Bourne and 
Haygood (1959). 


3. If there is no feedback on the category 
of an example, that example cannot be used 
by the subject carrying out systematic opera- 
tions. Therefore, the smaller the proportion 
of examples presented with feedback, the 
larger the number of examples needed to com- 
plete definition of the concepts. This would 
appear in the data as a greater number of 
trials to criterion, or a greater number of 
errors, The effect of probability of feedback 
that would be expected on the basis of sys- 
tematic operations by the subject agrees with 
the findings of Bourne and Pendleton (1958). 


Since the discrimination learning type of 
arrangement as presently used is not struc- 
tured to permit easy analysis of the role of 
systematic operations by the subject, and since 
such analyses have not been made, the state- 
ments above merely indicate an alternative 
way of viewing the data that are derived 
from a situation designed to underline the 
similarities of concept solving to a type of 
simple learning, The same extension of the 
findings can be made to other families of 
concept problems, such as those modeled after 


verbal learning situations (Heidbreder, 1947; 
Hull, 1920; Oseas & Underwood, 1952). 

Thus there is evidence that the findings of 
the studies above generalize to performance on 
other problems out of the same family. There 
is a basis, furthermore, for using the findings 
to construct alternative explanations for per- 
formance on other types of problems, e.g., 
discrimination learning type of concept 
solving. 


Implications for Method 


The work carried out above has made use 
of a special technique for the experimental 
analysis of concept problem solving. This has 
been the use of many, repeated trials, not with 
successive examples of a single problem, but 
with different representatives of a class of 
problems. Most work on problem solving 
concerned with higher-order processes has kept 
fairly close to a one-problem, one-outcome 
procedure. The procedure is an inherently 
unreliable one, and one that makes it difficult 
to see the systematic operations and their 
formation, factors of central interest in this 
area, 

In the experiments above, the subjects 
solved a series of problems drawn from a given 
family of problems. The procedure is similar 
to that used in learning set experiments. Since 
the subjects undergo a series of problems, it 
is possible to trace the development of the 
performance and its asymptotic level, By 
using all the data over the series of problems, 
it is possible to see the effect of the experi- 
mental variables (e.g., the particular type of 
example series used) in determining both the 
rate at which the performance develops and 
its asymptotic level. By the analysis of the 
effects of the experimental variables on rate 
and asymptote, it is possible to see the inner 
structure of this performance. It is this type 
of information that is necessary if stable re- 
sults are to be obtained on higher-order proc- 
esses in the area of problem solving. 

This multiple-problem technique, which 
forces the subject to discover and maintain an 
optimal method, is not designed to give certain 
types of information to the investigator of 
problem solving. For example, it does not, as 
used here, give information as to the variety 


28 GLANZER, HUTTENLOCHER, AND CLARK 


of responses made by the individual subjects 
at their first exposure to the experimental 
situation. This type of information could, 
however, be analyzed out of the error data 
from the early problems. It does not give 
information on changes in responses within a 
single problem. For this type of information, 
it is necessary to use an experimental arrange- 
ment more like the discrimination learning 
arrangement of Archer et al, (1955). It was 
pointed out above, however, that these dis- 
crimination learning arrangements make it 
difficult to discern the part played by system- 
atic operations in determining the measured 
response. 

The multiple-problem or learning set 
method used here is potentially valuable for 
all types of problem solving hitherto studied 
by one-problem methods, because it may be 
viewed as forcing a high-order process to dis- 
play itself over a longer period of time and as 
it develops to a stable form. It is suggested 
here, therefore, that areas of problem solving 
that have become identified with a one-prob- 
lem procedure (e.g., the study of insight) 
could profit from experimental examination of 
performance over series of such problems. 
Identifying problem solving with the confu- 
sions of first-problem response restricts the 
field of study to its own disadvantage. The 
use of one-problem procedure is also disadvan- 
tageous in that it leaves as a major uncon- 
trolled factor the subject’s training before he 


enters the experimental situation. The multi- ` 


ple-problem procedure permits the subject’s 
training to be brought under control within 
the experimental situation. 


Implications for Computer Simulation and 
Information Theory Analyses of Problem 
Solving 


Recently there have been a number of in- 
vestigations concerned with computer simu- 
lation of cognitive processes. The work on 
concept problems by Hovland and his associ- 
ates was motivated by an attempt to carry out 
computer simulation of this type of problem 
solving. This is evidenced by both the early 
informational analysis of the problem and by 
the later work on computer simulation (Hunt 
& Hovland, 1961). The work on computer 


simulation attempts to approximate human 
problem solving performance by means of an 
appropriately programed computer. Although 
the image of the human subject as a complexly 
programed computer is a promising one, the 
approach has several inherent difficulties. 
One is that a program is a complex affair and 
is not a satisfactory substitute for a theory. 
Another is that it is possible to simulate by the 
use of operations that are not used by the 
human subject. Thus, for example, a computer 
could be programed to handle the concept 
problems on the basis of hypothesis storage 
and selection. With the addition of program- 
ing to include randomization and delays, it 
should be possible to reproduce some aspects 
of average performance. Since there are many 
devices that could be used to maintain close 
approximation of the computer’s final output 
to the subject’s performance, there would 
probably be considerable time before it was 
recognized that the basic operation used in 
the computer program did not correspond to 
the operation carried out by human subjects. 

Direct experimental analysis of the per- 
formance to be simulated in simplified situa- 
tions may be the simple and economical way 
to start a program of simulation. Absence of 
direct experimental information about the sub- 
ject’s performance can also generate similar 
difficulty for attempts to apply general theories 
such as information theory to human problem 
solving. Thus, Hovland, using an information 
processing approach to the family of problems 
studied here, was unable to explain his experi- 
mental results satisfactorily, because he had 
assumed that the. subject’s. systematic opera- 
tions consisted of the elimination of hypoth- 
eses. It has been shown here that the subject 
operates not on the hypotheses, but on the 
example dimensions. Once the systematic 
operations elicited by a task are known, it 
becomes possible to apply the ideas of infor- 
mation theory profitably. 


Experimental Definition of Systematic Opera- 
tions 


Although introspective data indicate that a 
subject carries out a variety of complex, sys- 
tematic operations during the course of prob- 
lem solving, there has been little information 


PARAMETRIC STUDY oF CONCEPT FORMATION 29 


on the nature of these operations: the varia- 
bles that encourage their appearance and the 
variables that determine their efficiency. This 
lack of information combined with the men- 
talistic connotations of such responses, has 
led psychologists to avoid their study and 
has led them to construct molecular theories 
of problem solving (e.g., mediational theo- 
ries). 

In order to construct theories in which the 


responses correspond to the complex series of 
events reported introspectively, it is necessary 
to give such responses experimental definition. 
Theories of this sort have been called for 
persistently in psychology (see the recent call 
by Miller, Galanter, & Pribram, 1960). The 
attempt has been made here to give some of 
the basic systematic operations carried out by 
the subject the necessary experimental 
definition. 


REFERENCES 


ARCHER, E. J., Bourne, L. E., JR, & Brown, F. G. 
Concept identification as a function of irrelevant 
information and instructions. J. exp. Psychol, 
1955, 49, 153-164. 

Bourne, L. E., Jr. Effects of delay of information 
feedback and task complexity on the identification 
of concepts. J. exp. Psychol., 1957, 54, 201-207. 

Bourne, L. E., Jr., & Haycoop, R. C. The role of 
stimulus redundancy in concept identification. J. 
exp. Psychol., 1959, 58, 232-238. 

Bourne, L. E., JR, & Penpreton, R. B. Concept 
identification as a function of completeness and 
probability of information feedback. J. exp. Psy- 
chol., 1958, 56, 413-420. 

Bruner, J. S., Goopnow, JACQUELINE J., & AUSTIN, 
G. A. A study of thinking. New York: Wiley, 
1956. 

Cani, H. E., & Hovranp, C. I. The role of memory 
in the acquisition of concepts. J. exp. Psychol., 
1960, 59, 137-144. 

Granzer, M., & Crarx, W. H. Accuracy of per- 
ceptual recall: An analysis of organization. J. 
verbal Learn. verbal Behav., 1963, 1, 289-299. 

GREENHOUSE, S. W., & Gersser, S. On methods in 
the analysis of profile data. Psychometrika, 1959, 
24, 95-112. 

Hanrmann, E., & Kasanın, J. A method for the 
study of concept formation. J. Psychol., 1937, 3, 
521-540. 

Hewsreper, E. The attainment of concepts: II. 
The process. J. Psychol., 1947, 24, 93-138. 

Hovranp, C. I. A “communication analysis” of con- 
cept learning. Psychol. Rev., 1952, 59, 461-472. 

Hovranp, C. I., & Weiss, W. Transmission of in- 
formation concerning concepts through positive and 


negative instances. J. exp. Psychol., 1953, 45, 175- 
182. 

Hutt, C. L. Quantitative aspects of the evolution of 
concepts. Psychol. Monogr., 1920, 28(1, Whole No, 
123): 

Hunt, E. B. Memory effects in concept learning. 
J. exp. Psychol., 1961, 62, 598-604. 

Hunt, E. B., & Hovranp, C. I. Order of considera- 
tion of different types of concepts. J. exp. Psychol, 
1960, 59, 220-225. 

Hunt, E. B., & Hovranp, C. I. Programming a model 
of human concept formulation. Proc. West. Joint 
Comput. Conf., 1961, 19, 145-155. 

Levine, M. A model of hypothesis behavior in dis- 
crimination learning set. Psychol, Rev., 1959, 66, 
353-366. 

Merzcrr, R. A comparison between rote learning 
and concept formation. J. exp. Psychol, 1958, 56, 
226-231. 

Miter, G. A. Gavanter, E., & Priram, K. H. 
Plans and the structure of behavior. New York: 
Holt, 1960. 

Oseas, L., & Unperwoon, B. J. Studies of distributed 
practice: V. Learning and retention of concepts. 
J. exp. Psychol., 1952, 43, 143-148. 

Sueparp, R. N., Hovranp, C. I, & Jenxins, H. M. 
Learning and memorization of classification. Psy- 
chol. Monogr., 1961, 75(13, Whole No. 517). 

Smoke, K. L. An objective study of concept forma- 
tion. Psychol. Monogr., 1932, 42(4, Whole No. 
191). 

Snevecor, G, W. Statistical methods. Ames: Iowa 
State Coll, Press, 1946. 


(Received June 29, 1962) 


Psychological Monographs: General and Applied 


RANGE-PREQUENCY COMPROMISE IN JUDGMENT * 


ALLEN PARDUCCI 
University of California, Los Angeles 


Whole No. 565, 1963 
» 


Vol. 77; No. 2 


The ‘experiments were designed to provide a basis for evaluating alternative 
characterizations of the frame of reference for judgment. Absolute judgments 
of size, weight, or numerousness were obtained using either verbal categories 
or magnitude estimations. Adaptation level, a measure of the centering of the 
response scale, was demonstrated to vary directly with independent variation of 
-b@th the midpoint and the median, but not with the mean, of the distributions 
of, stimuli presented for judgment. The form of the judgment functions also 
varied systematically with the stimulus distributions. These results indicate 
, that adaptation level equations must incorporate special weightings for the 
stimulus extremes and also for stimuli of intermediate value. Judgments rep- 
fesent a compromise between the tendency to divide the range of stimuli into 
proportionate subranges and the tendency to use different parts of the response 


scale with proportionate frequencies. 


HE relativity of judgment is one of the 

most striking and suggestive of psycho- 
logical phenomena. The simplest examples 
are from the field of perception. Many of 
ithe well-known illusions serve to demonstrate 
the dependence of perception upon the physi 
cal relationships between different features of 
the environment. Thus, the perceived length 
of the Miiller-Lyer line varies with the direc- 
tions and lengths of the arrows at its ends. 
Such illusions are only occasionally of suffi- 
cient prominence to capture our attention, but 
the regularities of perception, the perceptual 
constancies, also depend upon the physical 
relationships among stimuli. The perception 
of an object tends to remain constant in spite 
of changes in its distance or illumination. 
Constancy is achieved to the degree that the 
relative size and the relative intensity of the 
object and features of its physical surround 
remain constant. Some such consideration of 
physical relationships must be incorporated 
into any specification of the conditions suffi- 
cient to determine the perceptual judgment. 
The present monograph describes an attempt 


1 Research supported by National Science Founda- 
tion Grants G-7485 and G-19847. Substantial con- 
tributions to various phases of this research were 
made by each of the following: Louise Marshall, 
T. Tanner, R. Calfee, A. Sandusky, Judith Brown, 
H. Vickers, M. Meissner, S. Wolfson, J. Ptak, and 
J. Lawson. 


to specify further those relationships between 
stimuli which determine perceptual judg- 
ments. 

Although the relevant stimulus relation- 
ships are easier to specify for simple percep- 
tual situations, the principles which underlie 
the relativity of judgment have some of their 
most suggestive applications with more com- 
plex, symbolic materials. Categorization is 
always made with respect to some context or 
frame of reference. A political position is 
judged liberal or conservative in relation to 
some distribution of political positions, so 
that the same expression of opinion may ap- 
pear liberal in one context, conservative in 
another. Clinical assessments of the degree 
of mental illness have meaning only in terms 
of specific norms which themselves’ reflect 
previous observations over some range of be- 
havior. Even the most personal and absolute 
of moral judgments involve implicit com- 
parisons, as when the same crime seems more 
serious in one context than in another. Such 
systematic shifts in the judgments of specific 
samples of behavior have been demonstrated 
in experiments using social (e.g., Sherif & 
Hovland, 1961), clinical (Campbell, Hunt, & 
Lewis, 1957), and ethical judgments (Mc- 
Garvey, 1943). The results of these studies 
appear consistent with the judgmental rela- 
tivity illustrated by the simpler phenomena 
of perception. 


2 PARDUCCI 


The objective of the perceptual research 
described in the present monograph is to dis- 
cover general principles of judgment which 
can be used to guide the analysis of any 
frame of reference—be it perceptual, cogni- 
tive, symbolic, or evaluative. The working 
assumption is that all judgments are made 
in accordance with the same basic rules. The 
research is designed to study a perceptual 
situation which is simple enough to permit 
precise identification of these rules. 

Previous attempts to formulate general 
laws of judgment have usually been based 
upon psychophysical experiments. Such work 
goes back at least as far the the early nine- 
teenth century statements of Weber’s law— 
that judgments reflect the ratios of physical 
intensities rather than the absolute differences 
between them. Although Fechner’s basic law 
of psychophysics singles out the absolute 
threshold as the reference point against which 
each stimulus magnitude is compared, sub- 
sequent research has emphasized other fea- 
tures of the stimulus context. Thus, Holling- 
worth (1910) based his “law of the central 
tendency of judgment” upon a demonstration 
of systematic errors in the reproduction of 
lines varying in length, the reproduced 
lengths tending toward the average value of 
the preceding lines. Woodrow (1933) as- 
sumed a set or readiness for the average 
stimulus value, each judgment depending 
upon the relationship between the judged 
stimulus and this expected value. A similar 
hypothesis had been applied to affective judg- 

` ments by Beebe-Center (1932, p. 229) in the 
form of his “law of hedonic contrast.” Beebe- 
Center used the results of experiments with 
odors and colors to support his assertion that 
the pleasantness of a stimulus varies inversely 
with the degree of pleasantness of previously 
presented stimuli. These later examples have 
in common the assertion that the scale of 
judgment is centered upon some average 
value of the distribution of stimuli which 
form the context for judgment. 


Theory oj Adaptation Level 


The prominence of the effects of stimulus 
context encourages attempts at further quan- 
tification of the general relationship between 


the scale of judgment and the distribution of 
the stimuli being judged. Helson (1938) and 
Johnson (1944, 1955) have specifically re- 
lated the center of the scale of judgment to 
the geometric or logarithmic mean of the 
physical values of the stimuli, and Helson 
(1947, 1959) elaborated upon this simple 
formulation so as to permit differential 
weighting of the stimulus values with respect 
to their effects upon judgment. Helson’s pri- 
mary dependent variable has been the adap- 
tation level, the stimulus value which appears 
neutral to the observer (i.e., the stimulus 
assigned the middle category of his scale of 
judgment). The judgments of other stimuli 
are assumed to depend upon this value so 
that the judgment of any given stimulus is 
determined by the ratio of the physical value 
of the stimulus to the subject’s (S’s) current 
level of adaptation. 

Helson’s theory treats the adaptation level 
as a weighted mean of all of the various 
stimuli affecting perception. In practice, at- 
tention has usually been directed to a series 
of stimuli presented for judgment, to un- 
judged background stimuli, and to similar 
stimuli which have been experienced in the 
recent past.? The weighting of these different 
classes of stimuli depends upon the specific 
situation. Thus, if one stimulus is singled out 
as the standard, as in the method of constant 
stimulus differences, it would have a greater 
effect upon adaptation level (in this case the 
‘traditional point of subjective equality, PSE, 
for that standard) than would the various 
comparison stimuli. An important feature of 
the theory, however, is its assumption that 
the effect of each stimulus (insofar as it has 
any effect at all) is to pull adaptation level 
toward its own value. Since the judgment of 
each stimulus varies directly with the ratio of 
its physical value to adaptation level, the 
judgments vary inversely with the adaptation 


2 Helson’s basic equation for adaptation level has 

the following form: 
AL=S'*BYP’, 

ie. the adaptation level (AL) is equal to a weighted 
geometric mean of the geometric means of the series 
stimuli, S, the background stimuli, B, and past 
stimuli, P. The exponents represent the weighting, 
such that a-++b-+c=1, and all three are positive. í 
This equation is elaborated in Helson (1959). 


RANGE-FREQUENCY COMPROMISE 3 


level. In this sense, the theory represents a 
quantification of the principle of perceptual 
contrast. Examples of assimilation can also 
be handled within the adaptation level frame- 
work by applying the weighted-mean equation 
so as to describe the central tendency of judg- 
ment (Parducci & Marshall, 1962). Assimi- 
lation occurs when judgment categories are 
anchored to stimuli whose PSEs have shifted 
toward a more central value of the distribu- 
tion of stimuli. 
When the effects of background stimulation 
and past experience are not of major concern 
and when there are no special instructions, 
standards, or anchoring stimuli specifying the 
labels to be applied to any given stimulus, 
the theory of adaptation level weights the 
different stimulus values only in accordance 
with their relative frequency of presentation. 
In such situations (e.g., with the method of 
single stimuli or so-called “absolute judg- 
ment”), the adaptation level is simply the 
mean of the respective stimulus values, scaled 
in accordance with their discriminability— 
usually logarithmically. Although the stimuli 
should ideally be weighted for recency of 
presentation (Parducci, 1954; Woodrow, 
1933), the usual practice has been to present 
the same series repeatedly in randomized or- 
ders, averaging the judgments of each stimu- 
lus over the entire series to counterbalance 
ordinal effects. Thus in the simplest case, 
each stimulus contributes to adaptation level 
(and thence to the scale of judgment) in the 
same way that each score in a distribution 
shares in the determination of the mean of 
the distribution. 
Judgment is relative. The significance of 
adaptation level theory is in its specification 
of the stimulus values to which each judg- 
ment is related. And in the simplest case, 
the theory asserts that the only relevant 
values are the stimulus to which the judgment 
is applied and the mean of the other stimuli 
presented for judgment. As a rough approxi- 
‘mation, this formulation appears consistent 
with virtually all of the research on absolute 
judgment. Whenever the mean of the stimuli 
has been varied, the resulting adaptation level 
has varied in the same direction and the judg- 

“ments of any given stimulus have varied in 
the opposite direction. 


Range-Frequency Compromise 


Since variation in mean is necessarily con- 
founded with variation in other features of 
the stimulus context, the relevant data (re- 
viewed in Guilford, 1954; Helson, 1959; 
Johnson, 1955) are also consistent with alter- 
native interpretations. Volkmann (1951) and 
Sherif and Hovland (1961) have ascribed a 
crucial role to the end values (i.e., the largest 
and smallest of the stimulus values presented 
for judgment), asserting that the judgment 
of any stimulus depends on the relationship 
between its physical value and the values 
of the two extreme stimuli. This emphasis 
upon the end values has a general appeal in 
that attention seems frequently to be directed 
at the extremes of experience. The extreme 
left and extreme right of the political spec- 
trum, for example, often seem to be more in 
the public eye and to exert influence out of all 
proportion to their numerical strength. Simi- 
larly, events characterized by the extremes of 
emotion, trauma or ecstasy, appear to have 
an especially significant role in the establish- 
ment of the frame of reference for our reac- 
tions to emotional experiences. 

The specification of the stimulus-response 
relationships in terms of the stimulus extremes 
is more easily accomplished with respect to 
simple psychophysical materials where we 
can deal with physical values along a single 
stimulus dimension. If the categories of judg- 


» ment were assigned to fixed proportions of 


the range, symmetrical about the middle cate- 
gory, the adaptation level or middle category 
of the scale of judgment would center on the 
midpoint between the two stimulus extremes 
(i.e., their mean or one half their sum). In 
most of the absolute-judgment experiments 
in which the mean of the stimuli has been 
systematically varied, the midpoint has var- 
ied directly with the mean. However, Johnson 
(1944, 1949), Parducci (1956a, 1959), Par- 
ducci, Calfee, Marshall, and Davidson (1960) , 
and Parducci and Marshall (1961a, 1961b) 
varied the stimulus frequencies while holding 
constant the end values (and hence the mid- 
point) of the distribution of stimuli presented 
for absolute judgment. Significant shifts were 
obtained (adaptation level varying directly 
with the mean) in spite of the fixed mid- 


4 ParRDUCCI 


points. This indicates that even if the limits 
of experience, the two extreme values, require 
special weighting in the adaptation level equa- 
tion, intermediate values also affect the scale 
of judgment. 

Another possibility is that different descrip- 
tive terms, categories of judgment, or portions 
of the response dimension, are used with fixed 
relative frequencies. It would seem a misuse 
of language to use only a limited portion of 
the total scale of judgment, to label every- 
thing “good” or every man “tall.” Rather, 
the proper use of judgmental terms seems to 
require some kind of balance; for some 
things to be good, others must be bad; the 
total number of tall men must be roughly on 
the same order as the total number of short 
men. 

In the simplest case, the two halves of the 
scale of judgment might be used with exactly 
equal frequency. Given a series of weights 
to be judged either “heavy” or “light,” the 
heavier half of the presentations would 
be called heavy and the lighter half light. 
The adaptation level, which in this case is 
the limen between the two categories, would 
be the median of the stimulus values. 

Although the median is usually further from 
the midpoint than from the mean for skewed 
distributions, all three measures tend to be 
positively correlated unless specific concern 
is taken to vary them independently. Could 
the median alone serve as a sufficient pre- 
dictor of adaptation level? With median held 
constant, adaptation level varies directly with 
variation in midpoint (Parducci et al., 1960; 
Parducci & Marshall, 1961a, 1961b); and 
thus the median, like the midpoint, can at 
best be only a partial predictor of the scale 
of judgment. 

Since the mean has been either equal to 
both the midpoint and median or between 
them for most of the distributions of stimuli 
used in experiments on judgment, it is possi- 
ble that the general success of adaptation 
level theory in relating the experimental re- 
sults to the mean of the stimuli has obscured 
basic tendencies of judgment. A range-fre- 
quency compromise has been proposed as an 
alternative interpretation of the context effects 
in judgment (Parducci et al., 1960). Two 


tendencies of judgment are postulated: (a) 
to divide the range of stimuli into proportion- 
ate subranges, each category of judgment 
covering a fixed proportion of the range; and 
(b) to use the categories of judgment with 
proportionate frequencies, each category being 
used for a fixed proportion of the total num- 
ber of judgments. The adaptation level is 
determined by a compromise between these 
two tendencies so that a weighted combination 
of the midpoint and median provides a better 
prediction of the center of the scale of judg- 
ment than the mean (which is itself only a 
rough approximation to such an intermediate 
value). It is also assumed that the other 
category limens reflect the compromise be- 
tween these two tendencies. 

Support has been obtained for this com- 
promise theory through experiments in which 
the midpoint, median, and mean have been 
independently manipulated for absolute judg- 
ments. Using some 30 different distributions 
of numerals, each distribution composed of 45 
three-digit numerals presented simultaneously 
on a single page, it was found that the adap- 
tation level (the mean of the numerals judged 
“medium”) varied significantly with inde- 
pendent variation of either the midpoint or 
the median of the stimulus distributions but 
that variation of the mean had little effect 
upon judgment when these other two para- 
meters were held constant (Parducci et al., 
1960). When the adaptation levels for all 30 
distributions were included in a multiple- 
regression analysis, with various combinations 
of the midpoint, median, mean, and range 
as predictor variables, it was found that only 
the midpoint and median made significant 
contributions to the prediction of adaptation 
level. Significantly better prediction could be 
made from a combination of the midpoint 
and median (their weighting being .55 and 
.45, respectively) than from either alone or 
than from the mean alone. Similar conclu- 
sions were drawn from the data obtained when 
some of the same sets of numerals were pre- 
sented orally by the traditional method of 
single stimuli (Parducci & Marshall, 1961a) 
and when different sets of lines were presented 
for judgments of length, again with all stimuli 
simultaneously present (Parducci & Marshall, 
1961b). s 


RANGE-FREQUENCY COMPROMISE 5 


Plan of the Present Research 


The stimulus materials and procedures for 
the experiments supporting the range-fre- 
quency compromise had heretofore been cho- 
sen so as to permit economical exploration of 
a variety of stimulus distributions, with inde- 
pendent variation of the alternative measures 
of central tendency. Since the results were 
encouraging, it was determined to employ the 
same logic in the design of further research— 
this time with more traditional stimulus ma- 
terials and the method of single stimuli. The 
present study uses three perceptual dimen- 
sions (numerousness, size, and weight) with 
independent variation of the midpoint, me- 
dian, and mean for each dimension. For most 
conditions, the stimuli are presented singly 
in a long series in which the same probability 
distribution is repeated for successive blocks 
of presentations. 

The range-frequency theory yields the 
prediction that adaptation level varies directly 
with variation of either the midpoint or the 
median. Insofar as the categories divide the 
difference between the smallest and largest 
stimuli into fixed proportions, the category 
limens and the adaptation level must vary 
directly with variation in the mean of the 
two end values, the midpoint. Insofar as the 
categories are used with proportionate fre- 
quencies, those in the upper half of the re- 
sponse scale being used with the same fre- 
quency as those in the lower half, the middle 
limen or adaptation level must fall at the 
median value of the distribution of stimuli. 
Even if there is a slight tendency to use one 
half of the response scale (e.g., the three 
lower categories) more frequently than the 
other half, the adaptation level should vary 
directly with yariation in the median; the 
other category limens might also shift, though 
not necessarily in the same direction since 
their values would depend upon the relative 
frequencies of presentation of stimulus values 
over different regions of the range. Variation 
in the mean of the stimulus distribution 
should have no effect upon adaptation level 
unless either the midpoint or median were 
also varied. However, the proportionate-fre- 
quency tendency might affect other points on 
the scale of judgment since independent varia- 


tion of the mean would necessarily entail 
differences in the stimulus frequencies. And 
whenever the stimuli are presented with vary- 
ing frequencies, there would have to be some 
kind of compromise between the two postu- 
lated tendencies of judgment since differences 
in frequencies would produce category widths 
which differed from those produced by the 
proportionate-subrange tendency. 

Variations in the distribution of the stimuli 
presented for judgment thus constitute the 
major independent variable for this research, 
with the variations being characterized in 
terms of alternative measures of central tend- 
ency—the midpoint, median, and mean. The 
effects of these variations are studied with 
three perceptual dimensions in order to pro- 
vide a broader base for evaluating the gen- 
erality of the range-frequency theory. This 
theory yields no predictions about the inter- 
action between the effects of distribution and 
dimension except insofar as additional as- 
sumptions are made about the relative 
strengths of the proportionate-subrange and 
proportionate-frequency tendencies (e.g., be- 
cause of differences in the discriminability of 
the stimuli) for different stimulus dimensions. 
Otherwise, the same distribution effects should 
be obtained for all three dimensions. 

The method by which Ss record their judg- 
ments constitutes a third independent varia- 
ble. Two different methods of recording are 
used for many combinations of distribution 
and dimension, This third variation was de- 
signed to manipulate S’s information concern- 
ing the relative frequencies with which he has 
used the different categories of judgment. The 
assumption is that with increased information 
about category frequencies, there is a relative 
increase in the strength of the proportion- 
ate-frequency tendency. Thus, it is hypoth- 
esized that the effects of variation in the 
median of the stimulus distribution interact 
with the effects of variation in the method 
of recording, the effects of the median being 
greater when S is provided with more infor- 
mation about his category frequencies. 

Variations are also introduced in the in- 
structions for judgment, particularly with 
respect to the set of response categories. For 
most of the experimental conditions, Ss are 
restricted to either five or six verbal cate- 


6 Parpucct 


gories. However, for some of the conditions, 
additional groups of Ss are free to generate 
their own scales of magnitude estimation, 
using as many different numerals as seem 
appropriate. It has been asserted that con- 
text effects are minimal with magnitude-esti- 
mation scales (Stevens & Galanter, 1957). 
The purpose of varying the response scales 
is to determine whether the range-frequency 
compromise operates with magnitude estima- 
tions as well as with category judgments. 

This research has extended over a period 
of several years. A number of additional, rel- 
atively minor variations have inevitably crept 
into the research design since the full scope 
of the project was not foreseen when the work 
began. Some of these are not investigated 
systematically. Thus, the effects of the shift 
from five to six categories of judgment are 
confounded with differences in stimulus di- 
mension. Others are systematically evaluated 
in combination with only some of the other 
experimental variations. Since it became in- 
creasingly clear that the effects of the specific 
order in which the stimuli were presented 
could interact with the effects of the other 
experimental variations, the number of differ- 
ent subgroups of Ss, each judging the same 
stimuli in different orders, tended to increase 
as the research progressed. 

The original planning of this research was 
directed toward the evaluation of the theory 
of adaptation level, with the range-frequency 
compromise considered as a possible basis for 
further elaboration of the theory. Adaptation 
level is thus the major dependent variable 
for all of the experimental conditions, and the 
results of statistical analyses are always re- 
ported first with respect to differences in 
adaptation level. However, interest in the 
effects of the major variables upon other fea- 
tures of the scale of judgment increased as 
the research progressed. It became clear that 
the theory of a compromise between propor- 
tionate-subrange and proportionate-frequency 
tendencies could be more powerfully evalu- 
ated in terms of the entire scale of judgment 
rather than in terms of just the central value, 
adaptation level. Consequently, scales of 
judgment (either mean judgment or mean 
stimulus functions) are graphically presented 
and occasionally subjected to statistical com- 


parison. The mathematical relationship be- 
tween these alternative sets of scale values 
(i.e., the mean of the stimuli in each category 
versus the mean of the judgments applied to 
each stimulus) is complex enough so that it 
seems appropriate to present samples of both 
types of function. However, predictions from 
the range-frequency compromise are more 
readily derived in terms of the mean judg- 
ment functions; and these add more to the 
generality of the results since they do not 
include the value of the adaptation level (de- 
fined below as a mean of stimuli). Conse- 
quently, evaluations of the mean judgment 
functions are in general used to supplement 
the statistical analyses of the differences in 
adaptation level. 


METHOD 
Stimuli 


Physical Dimensions. Three different dimensions 
are used, with nine stimulus values selected for each 
dimension. The research began with judgments of 
the numerousness of dots, projected from black-white 
transparencies upon a standard white screen before 
groups of Ss. The projected diameter of each dot 
was 4.5 cm., with from 8 to 122 dots in nine different 
patterns. Each pattern consisted of black dots, 
somewhat evenly spread over a 110 X 165 cm. white 
field, projected on a 180 X 240 cm. screen. The screen 
was approximately 15 ft. from Ss in a semidarkened 
room. 

The dimension of numerousness was also adapted 
for simultaneous presentation by scaling each pattern 
down to a 1346X 13%% in. rectangle. These small 
rectangles were then arranged so that 45 of them 


Fic. 1. 


The nine patterns of dots used for both 
successive and simultaneous presentations. 


EEE | 


aE EEDE 


RANGE-FREQUENCY COMPROMISE 7 


could be printed in nine rows, of 5 rectangles each, 
on a single 8 X 11 in. page. A facsimile of the nine, 
scaled-down patterns is shown in Figure 1. Each 
projected pattern consisted of dots in the exact loca- 
tions used for the patterns with fewer dots plus the 
additional number required for that pattern. The 
same pattern, presented in the same up-down, left- 
right orientation, was used on all presentations of 
each stimulus value. 

Further research was conducted using judgments of 
the sizes of squares projected under the same con- 
ditions as the dots. The nine black squares on the 
110 X 165 cm. white field varied in width from 5.4 
to 23.2 cm. Care was taken to insure that each 
square was centered with respect to its surrounding 
field. 

Some 15-20 different photographs were made of 
each of the nine different squares. However, only one 
master photograph was made of each dot pattern so 
that the slides used in the actual sequence of presen- 
tations of the dots were duplicates of the master 
photographs. Thus, the specific arrangement of each 
constellation of dots and also irregularities (if any) 
in the master photographs would be repeated at each 
presentation of that stimulus. This would tend to 
facilitate identification of the specific stimulus values 
for the dots as compared with the squares. 

The third dimension was lifted weight. The nine 
weights, varying from 40 to 250 gm., were presented 
by means of an apparatus constructed in accordance 
with the description reported by Sherif, Taub, and 
Hovland (1958). The apparatus served to block 
S’s view of the weights and of the experimenter (E), 
the lifting being accomplished by pulling down a 
handle which was attached to a cord leading over a 
pair of low-friction wheels to the weight. 

Stimulus Values, The nine stimulus values for each 
dimension are presented in Table 1. They were se- 
lected so as to yield equal intervals on a scale of 
category judgments when presented with equal fre- 
quency. The steps between successive stimulus values 
are logarithmically equal for the weights and roughly 
logarithmic for the squares and dots. The squares 
were selected on the basis of data obtained from 
a pilot study on judgments of the sizes of square 
cards. The pilot study followed the procedure for 
“pure” category scaling using successive iterations, as 
described by Stevens and Galanter (1957, pp. 381- 
382). Since the second iteration resulted in a marked 
overcorrection (i€. greater deviation from a “pure” 
category scale than did the first, suggesting that the 
procedure produces oscillation around the pure val- 
ues), the nine stimuli actually selected represent a 
compromise between the values used for the first and 
second iterations. The procedure for selecting the 
dots was less formal, but it also produced a fairly 
pure category scale (i.e. with roughly equal inter- 
vals—see Figure 2). 

The three sets of stimuli were also scaled using 
the method of successive intervals as described by 
Torgerson (1958—under Condition D of the law of 
categorical judgment, pp. 236-240). The results from 
this discriminability scaling (Figure 3) are in close 


5 


NUMEROUSNESS ¢————® 


MEAN JUDGMENT (NUMEROUSNESS AND WEIGHT) 
MEAN JUDGMENT (SIZE) 


WEIGHT A 
SIZE s— {42 
1 as ee 
ioah2ny Sh | SAC Rania) 
STIMULUS 


Fic. 2. Mean judgments of numerousness and size 
(random recordings) and weight (experimenter re- 
cording) for the stimuli presented with equal fre- 
quencies (Rectangular). 


+ 


SCALE OF EQUAL DISCRIMINABILITY 
a 


2 NUMEROUSNESS ¢——* 
WEIGHT ———*t 
SIZE — 
1 -i —— —o 
r 2 3 4 5 6 Y 8 9 
STIMULUS 


Fic. 3. Equal-discriminability scaling of numerous- 
ness, size, and weight for the conditions shown in 
Figure 2. 


“SuOISHOUN ¢ pe THA pasn suoANgIySIP çZ ay) JO 4 Auo ‘uorsuowp oy) Jou ‘MONNAUIsI 9Y} [qe] suONIPPE [onaMWaIeT q 


‘uOIsUaUNP əuo A[UO YIM pəsn suopnqysıp 10} UIA suorsuəawıp JAY) 10) UA WH w 


oor Let roe os gs os I I or zi s s £ t t (H) WPN yan 
oor Yz TIE os tr os t t Ẹ s s ZI Or T I (T) WPW MoT 
oor S71 v9 os vy 09 £ £ s s s zI zr o 0 (TH) 
ULIPO MOT Guiodprl 43H 
oor S21 +o og gs or o 0 (As zI £ s s £ s (HT) 
UEPIW Ya PUAPII MOT 
oot ras Toe os LEJ ss z z s ET S s 4 9 0 WEPIW-JUOdPIN 43H 
oot yer oe os Sr gs s £ £ £ 8 zI 4 + o ueIpay MOT urodpryy Y3 
oot VTI oge os ss St 0 + L zI s e £ € s ULIPO YSN “JUrodpriy MoT 
oor 971 v9 os oF sr 0 9 4 s S £T s z z uvIpay-wurodpry, MoT 
SOT ae Tse es os os st 0 o s s s S s s (de3) weon 437 
£6 VIL goe “ye os os s S S s s S 0 0 st (de3) uray MoT 
+01 921 TLE zs os os 8 £ t s s s 8 £ v (sjop) UL 43H 
96 GTI (4 sh os os v £ 8 S S S t € 8 (sjop) ueo MoT 
faa ver ol ss os os Ir £ £ £ e TI £ e E (eas) Uew q3H 
88 TI roz Sr os os £ £ € 1 s £ £ £ 13 (zis) wea, MoT 
ort wat foe vs os os zt £ £ £ e e zI £ e Ueo StH 
16 eT Toe oF os os £ € fas e £ e e e zI ULI MoT 
g oot TZI gre os $9 os z £ 91 z z z z s 6 (S10P) WPW 431H 
B OT 6T vve os se os 6 S z z z z 91 s z q(S}op) Uep MoT 
é oor rat Sse os ss os z z z 4i or Z z z 9 UEP YH 
oor Szi ye os g os 9 z z z or Lt z z z Wep MoT 
oot LTI Lge os 09 z z z ras 6 9 rag 0 0 qurodpryyy 43H 
oor ial 6'9 os oF 0 0 zI 9 6 (Ai z z z qurodpryl MOT 
oor TI ose os os s $ s s S F £ K s emue 
“WD "WO “WO W PW dW sotuonbary snmung uonnqysıq 
s1əqunu 3 a i í 7 p 7 7 7 5 
a (*W9) 13pm pue cata poe ZISZ S66E SSSI 6'SZE OOT H6L T9 TOS 86E (w3) 3M 
CWD) ZS‘ ("WD) SUNO | (77) wesw pue “(PH) ar a a A ASE 5 L) E E L A A CW u ppm) azg 
-1umu 10} Ueo HOWOÐ | werpou “(AW) IWOdPIN | zr s6 z4 ss se U2 å s a 8 (sjop) ssousnoxoumyy 
6 8 4 9 s v £ z { uouuq 
siojourered uomqsta 
:1əqumy snus 104 INLA Vsí d 
SUALANVUVG NOILLNAINISIQ INV 
‘KOILVINASAYG JO SAIONANOAAY ‘SANIVA IVOISAH WAH], HIM TIAWILS dO SNOLLAaINIsIq 
o% Tt TT4VL 


H 
p 
i 


RANGE-FREQUENCY COMPROMISE 9 


MEAN JUDGMENT (SIZE) 
e 3 4 5 Qs 


D 


NUMEROUSNESS @———® 
WEIGHT ee 


SCALE OF EQUAL DISCRIMINABILITY 


SIZE ———F 
i pak , hal 
2 3 4 5 
MEAN JUDGMENT (NUMEROUSNESS 
AND WEIGHT) 


Fic. 4. Relationship between equal-discriminability 
scaling and mean judgments for the conditions shown 
in Figures 2 and 3. 


agreement with those from the mean judgments, as 
indicated by the linearity of the functions relating the 
values on a Thurstone scale of equal discriminability 
to the mean judgments (Figure 4). The mean of the 
discrepancies between the actual cumulative propor- 
tions (i.e. the proportions indicating the judgments 
of the stimuli relative to the category limens) and 
the theoretical proportions based upon the best-fit 
scale values derived from the model was .029 for the 
dots, .021 for the squares, and .024 for the weights. 
These discrepancies are of the same order of magni- 
tude as those reported for the method of paired 
comparison by Edwards and Thurstone (1952) who 
advocate this use of the discrepancies as an index of 
the goodness of fit of the discriminability model. 

The nine stimuli were assigned the values 1 through 
9 for each of the dimensions, and all computations 
were performed upon these values unless otherwise 
stated. The midpoint, median, and mean of each 
distribution of stimuli (see Table 1) are also given 
in terms of these rescaled values. 

Stimulus Distributions. The distributions of stimuli 
are shown in Table 1. Their selection was restricted 
by the following rules: (a) only the nine stimulus 
values for each dimension could be used; (b) the 
complete frequency distribution could be presented in 
each successive block of 45; and (c) with the excep- 
tion of the Rectangular distribution, a second, mirror- 


image distribution was constructed for each distribu- 
tion, e.g. Low and High Midpoint, Low and High 
Median, Low and High Mean. Thus, the three lowest 
values are each presented twice in the Low-Mid- 
point distribution, and the three highest values are 
presented twice in the High-Midpoint distribution; 
the two highest and two lowest, respectively, were 
omitted for these two distributions. Within these 
restrictions, the key to the selection of each distri- 
bution is the value of either its midpoint, its median, 
or its mean. For most pairs of distributions, two 
of these parameters have the same value (viz., of the 
fifth stimulus) while the third parameter varies. Care 
was taken that the geometric means of the actual 
physical values (ie., number of dots, width in cm., 
weight in gm.) were approximately equal for each 
pair of equal-mean distributions selected for evalua- 
tion of the effects of differences in either their mid- 
points or medians. 


Presentation Intervals 


With the exception of the simultaneously presented 
patterns of dots (mimeographed), all presentations 
were by the method of single stimuli. The successive 
stimuli were projected upon the screen using an 
automatic projector which switched slides every 6 sec. 
(the slide changing taking less than .5 sec.). An ex- 
perimental session with 135 presentations took less 
than .5 hr., including time for instructions and chang- 
ing slide cartridges (the latter taking about 10 sec.). 


Instructions 


Instructions were read aloud by E to small groups 
of Ss (projected dots and squares) or to the indi- 
vidual S (lifted weights). The instructions for dots 
and squares differed only with respect to the dimen- 
sion for judgment. They began as follows: 


This is a study of how people judge the numerous- 
ness of dots [or the “sizes of squares”]. Before I 
read you the instructions for judgment, I am going 
to show you the series of dot patterns [or 
“squares”] you will later have to judge. Watch 
carefully. 


The 45 stimuli of the distribution were then suc- 
cessively projected upon the screen, the preview 
presentations taking approximately 5 min. After the 
first sentence of the instructions had been reread, Æ 
distributed response sheets and continued as follows: 


A series of dot patterns [or “squares”] will be 
projected on this screen. With each pattern, you 
are to record how numerous you judge the dots 
to be [or “how large you judge the squares to 
be”J, in comparison with the other patterns you 
are shown. Use one of the six [or “‘five”] categories 
printed at the top of your response sheet for each 
judgment: Very many—6, Many—5, Slightly more 
than average—4, Slightly fewer than average—3, 
Few—2, Very few—1 [for the five-category scale: 
“Very many—5, Many—4, Average—3, Few—2, 


10 PaRDUCCI 


Very few—1”; Few was changed to “Small” or 
“Light,” and “Many” was changed to “Large” or 
“Heavy” for the size and weight judgments, re- 
spectively]. Thus if it seems to you that there are 
very few dots in the presented pattern, in com- 
parison with the other patterns, record the number 
1 as your judgment for that presentation. If the 
pattern seems to have slightly more than the 
average number of dots, write down a 4. Thus for 
each presentation you will write down the number 
1, 2, 3, 4, 5, or 6, depending on whether there 
appear to be very few, few, slightly fewer than 
average, slightly more than average, many, or very 
many dots in the presented pattern, compared with 
the numbers of dots in the other patterns you are 
shown here. Do not go back and change your 
earlier responses after subsequent patterns have 
been shown. This study is concerned only with 
your immediate judgment of each pattern, right at 
the time it is presented. 


In addition to the categories of judgment, each of 
the random recording response sheets contained spaces 
for 90 responses, numbered from 1 to 90, so that 
Ss could keep pace with the stimulus presentation- 
number (called out by E as each stimulus was 
projected). 

Similar instructions were read to Ss lifting weights. 
However, there was no preview of the weights, Each 
judgment was announced aloud by S, and E recorded 
them himself. A list of the six (or five) categories was 
visually present for S during the weight lifting ses- 
sions, 

Column Recording. Special response sheets were 
substituted for the regular sheets when the cate- 
gories were to be segregated by columns. In this 
case, the six judgment categories were printed on a 
single line across the top of the sheet, with unnum- 
bered boxes forming columns under each of the six 
categories (60 response boxes under each category 
label). For these column recording conditions, the 
instructions were altered as follows: 


Use one of the six categories printed across the 
top of your response sheet for each presentation, 
Thus if it seems to you that there are very few 
dots in the presented pattern, in comparison with 
the other patterns, record the presentation number 
(which I will call out for the pattern) in the first 
column at the left of your page. If the pattern 
seems to have slightly more than the average 
number of dots, you would write its presentation 
number in the fourth column . . . etc. 


The objective of this variation in recording proce- 
dure was to manipulate S’s opportunity for assessing 
the relative frequencies with which he used the dif- 
ferent categories. This objective will be further elab- 
orated upon in the Discussion section. 

The special instructions for column recording were 
also used for two of the weight lifting conditions. 
The column recording, weight lifting Ss had to record 
their own judgments in the appropriate columns, re- 
leasing the handle of the weight apparatus in order 


to write down the presentation number for each 
judgment, 

Magnitude Estimations. Different sets of instruc- 
tions were used to allow Ss greater freedom in the 
selection of the judgmental responses. Four of the 
dot-judging conditions required that the scales of 
judgment be centered at zero (instead of three or 
between three and four), with no restrictions on the 
number of alternative categories. The following in- 
structions were inserted in place of the five- or six- 
category instructions for the sero-centered scales 
of judgment: 


Record your judgment in numerical form, with 
zero as the middle judgment value. Thus, if it seems 
to you that there are an average number of dots 
in the presented pattern, in comparison with the 
other patterns you have seen, write a zero in the 
place for that presentation. For patterns with fewer 
dots than average, record a negative number (that 
is, put a minus sign in front of your number: —2, 
—4, —10, —18, or whatever best indicates the 
relative scarceness of dots in the pattern). If it 
seems to you that the pattern contains more dots 
than average, recorded a positive number (that is, 
put a plus sign in front of the number: + 1, +5, 
+8, + 20, or whatever best indicates the relative 
numerousness of dots in the pattern). Thus, for 
each pattern, you will write down a number (a 
negative number, zero, or a positive number) 
which represents your judgment of the relative 
numerousness of dots in the pattern, It should be 
noted that since a zero is to be recorded for a 
pattern with an average number of dots, the num- 
bers do not represent the actual number of dots. 
Rather, you are developing your own psychological 
dimension of apparent numerousness, with each 
numerical judgment standing for your own psycho- 
logical impression of the relative numerousness of 
the dots in the presented pattern. 


For the 1,000-centered, dot and square conditions, 
Ss were required to center their scales at 1,000, The 
previous instructions for the O-centered scale were 
modified to make them more similar to those used 
by Stevens (1956) in the method of magnitude esti- 
mation. In addition to the substitution of 1,000 for 0, 
the following substitution was made in place of the 
corresponding section of the previous instructions: 


For patterns with fewer dots than average, you 
would write smaller numbers representing the 
relative scarceness of dots. If it seems to you that 
the pattern contains more dots than average, re- 
cord the number which best indicates the relative 
numerousness of the dots in the pattern—in this 
case, the number would be greater than 1,000. 


For the background-1,000 conditions, Ss were in- 
structed that 1,000 represented the size of a square 
just large enough to cover the constant lighted back- 
ground around each of the black squares. 


The squares you have previewed and will be judg- 
ing are all much smaller, so you will judge them 


a aaa 


RancE-Frequency COMPROMISE 11 


with smaller numbers. It is the subjective sensa- 
tion of size that you are recording, not some physi- 
cal unit. Since each square is exposed for only 
5 seconds, you would not have time to perform 
geometric calculations to arrive at your judgments. 
Just record whatever number seems most appropri- 
ate as an index of how large the presented square 
appears—remembering that 1,000 is the value you 
would assign the largest possible square, the one 
reaching from the top to the bottom of the 
lighted background area for each presentation, 


A final variation of the method of magnitude esti- 
mation was used for four of the successive-dot condi- 
tions, The instructions for these conditions explicitly 
required Ss to commit the stimulus error (i.e, to 
describe the physical properties of the stimuli). With 
each presentation, Ss were to record their estimates 
of the actual number of dots projected upon the 
screen. 

Simultaneous Presentations. For the simultaneous- 
dot conditions, a page of mimeographed instructions 
was stapled over the stimulus page. These instruc- 
tions were similar to those for the successively pre- 
sented dots except that the simultaneous instructions 
required that each judgment be recorded within the 
rectangular pattern to which it was being applied. 
Judgments were to be made either in terms of the 
usual five-category scale of numerousness or using a 
more loosely defined, four-category scale (“1 to 4 in 
order of increasing numerousness”) . 


Order of Presentation 


Previous work (Parducci, 1954, 1959; Parducci et 
al., 1960) had demonstrated marked ordinal effects 
for absolute judgments, and thus some effort was 
made to counterbalance for order and to permit as- 
sessment of the magnitude of its effects and possible 
interaction with the other experimental conditions. 
The choice of a particular order for each subgroup 
of Ss judging the projected dots or squares was re- 
stricted by the following six rules: (a) each distri- 
bution to be completely presented in each successive 
block of 45 presentations; (b) insofar as possible, the 
sampling of the distribution for each half of each 
block to be representative of the total distribution; 
(c) successive repetitions of the same stimulus value 
to be somewhat evenly spaced over the block of 45; 
(d) the stimuli for each block to be divided into 
two slide cartridges with 22 in one and 23 in the 
other, the order of presentation being 22-23 for the 
45 preview presentations and 22-23-23-22 for the 
regular series of 90 judged presentations; (e) the 
order within each slide cartridge to be reversed for a 
second subcondition for each order used; (f) for each 
distribution, the order to be the complement of the 
order for its mirror-image distribution so that if a 
Low-Midpoint condition began with Stimuli 4, 7, 
5, 2, and 6, in that order, the corresponding High- 
Midpoint condition began with Stimuli 6, 3, 5, 8, 
and 4 (i.e. the value of each High-Midpoint pres- 
entation being 10 minus the value of the stimulus 
for the corresponding Low-Midpoint presentation). 


The last two restrictions counterbalanced for pos- 
sible ordinal effects. This procedure maximizes the 
likelihood of demonstrating such effects by means of 
a significant interaction between order and distribu- 
tion. For if a particular order lowers the adaptation 
level for one distribution, the complementary order 
should raise the adaptation level for the mirror-image 
distribution. Assume, for example, that judgments 
tend to assimilate toward the value of immediately 
preceding judgments (as suggested in Parducci & 
Marshall, 1962). In the above example, Stimulus 5 
follows a higher stimulus (7) for the Low-Midpoint 
condition. By the rule for complementary orders, it 
must follow a lower stimulus (3) for the High-Mid- 
point condition, The reversal of these two orders 
puts Stimulus 5 after a lower stimulus (2) for the 
Low-Midpoint than for the High-Midpoint subcon- 
dition (where it follows Stimulus 8). Since the com- 
plementary orders constitute the same row in an 
Order X Distribution table for analysis of variance, 
if the first order tended to bring the adaptation levels 
together for the two distributions, the reversed 
order would tend to push the adaptation levels apart. 
This would produce an interaction between the effects 
of order and distribution. Interactions would also 
be maximized for any additional orders selected for 
the different pairs of distributions. 

A different order of presentation was used for 
each of either four or six subgroups of Ss judging 
the same distribution of squares or dot patterns. Each 
successive pair of Ss judging a given distribution of 
weights was exposed under a different order of pre- 
sentation. For the simultaneously presented dots, the 
patterns were arranged in order of increasing numer- 
ousness, starting with the upper left-hand corner, 
going down the leftmost column, then down the sec- 
ond column from the left, etc., with the densest pat- 
tern in the lower right corner of the page. 


Subjects 


More than 3,000 students from introductory classes 
in psychology at the University of California, Los 
Angeles, served as Ss for this research, The simul- 
taneous-dot materials were distributed in quasi-ran- 
dom order to a large class so that the data from the 
328 Ss, divided among the 11 simultaneous-dot con- 
ditions, were all collected from a single group. For 
the judgments of the projected dot patterns and 
also for the judgments of the squares, Ss served in 
subgroups of from 5 to 35 each, though almost all of 
the subgroups were composed of fewer than 10 Ss. 
The 290 Ss lifting weights were run individually. 


Method of Analysis 


Adapatation Level. The primary dependent vari- 
able for this research is the adaptation level, the 
middle value of the scale of judgment. This is the 
dependent variable with which both Helson and 
Johnson have been primarily concerned, and the 
predictions from the theory of adaptation level are 
clearest for this measure. 


12 


The procedure for determining the adaptation level 
has varied from situation to situation, and there 
are frequently alternative procedures for computing 
the adaptation level from any given set of experi- 
mental data. For the present research, adaptation 
level is defined as the mean of the stimulus values to 
which the middle category of judgment (e.g., Medium 
or Average) is applied. It is computed for each S by 
taking the arithmetic mean of those stimuli (scaled 
1 through 9) to which he has applied the third cate- 
gory of the five-category scales. When the scale in- 
cludes a sixth category, separate means are deter- 
mined for the third and fourth categories; the 
adaptation level is then the mean of these two means. 
All computations are based on the last 45 judgments, 

For those conditions in which dots were simul- 
taneously presented for judgment in terms of five 
categories, the adaptation level is the mean of the 
stimuli judged “3.” When these simultaneous-dot 
Ss were instructed to use four categories, the adap- 
tation level is the middle limen, the point where S 
stopped using “2” and started using “3.” When this 
limen coincides with a shift in stimulus values, it is 
determined by linear interpolation between the two 
bounding values, (e.g, if it fell between Stimuli 3 
and 4, adaptation level would be 3.5). When the 
break in categories comes within a string of repeti- 
tions of the same dot pattern, interpolation is made 
between the upper and lower limens for that pattern 
in proportion to the number of its repetitions judged 
above and below the category limen (e.g., if Stimulus 
4 were judged “2” three times and “3” seven times 
on a four-category scale, the adaptation level would 
be 3/10 of the way between 3.5 and 4.5, or 3.8). 
Other methods of computing adaptation level are 
described in the text below, where the resulting adap- 
tation levels are compared with the adaptation levels 
obtained using the computational procedures just 
described. 

Scale of Judgment. Further information about the 
effects of the different distributions is obtained from 


Parpucct 


tabulation of the mean judgments (1-5 or 1-6) or 
each of the respective stimuli and the mean of the 
stimulus values (1-9) to which each category is 
applied. Representative functions are shown in the 
various figures and provide a basis for further dis- 
cussion. 

Statistical Analysis. Analyses of variance are per- 
formed upon the adaptation levels, in each case group- 
ing together as many conditions as possible to form 
a meaningful factorial design. The differences be- 
tween the distribution parameters constitute the 
independent variable of central interest. However, 
the order of presentation (i.e., particular sequence of 
stimuli), the stimulus dimension, and the method of 
recording are also included in the analysis whenever 
the appropriate conditions have been run. The re- 
sults for all significant comparisons will be cited in 
the text, so that for each of the analyses, any com- 
parisons not cited fall short of the .05 level of sta- 
tistical significance. 


RESULTS 
Midpoint 


Adaptation Level. The Low- and High- 
Midpoint distributions, with identical values 
for the mean and median stimuli, differ by 
two stimulus steps with respect to the mid- 
point (i.e, mean of the two end values). 
Table 2 indicates that for judgments of size 
and also for both simultaneous and succes- 
sive judgments of numerousness, the adapta- 
tion levels are consistently higher for the 
High-Midpoint than for the corresponding 
Low-Midpoint distributions. An analysis of 
variance was performed upon the adaptation 
levels for the successively presented dot and 


TABLE 2 


VARIATION IN MIDPOINT: 


ADAPTATION LEVELS FOR DIFFERENT STIMULUS DIMENSIONS 


AND METHODS OF RECORDING 


Recording for dimension 


Numerousness & Size b Weight a 
Distribution SERIA ET 
Random Simultaneous Random Column Experimenter 
NIIMO N EM SD N M SD N M SD N M SD 
Low Midpoint 106 4.55 .43 28 4.63 38 32 4.74 Al 32 5.07 .54 40 453 59 
High Midpoint 109 4.96 .52 27 538 41 32 5.26 .60 32 5.33.51 40 4.76 61 
Rectangular 43 469 .64 29 4.71 AT 33 5.16 86 35 5.59 .67 10 4.60 89 


a Five categories. 
» Six categories. 


i ee 


RancE-FREQUENCY COMPROMISE 13 


square conditions in which the random re- 
cording sheets were used for the Low- and 
High-Midpoint distributions. With two dis- 
tributions, two stimulus dimensions, and four 
basic sequences of presentation, the adapta- 
tion levels of 16 different groups ( of eight 
Ss each) were treated in a 2X2x4 facto- 
rial design, The effects of the difference 
in distribution, low versus high midpoint, 
proved to be the major source of variance 
(F = 37.38, df = 1/112, p < .001). The 
effects of stimulus dimension, numerousness 
versus size, were also significant (F=6.08, 
df = 1/112, p < .025), the squares consist- 
ently yielding higher adaptation levels than 
the dots. It should be noted that since the 
dots and squares were judged in terms of 
five and six categories, respectively, the 
effects of stimulus dimension are confounded 
with the effects of the number of categories. 
None of the other sources of variation ap- 
proached statistical significance. 

A separate analysis was performed for the 
16 square-judging conditions, with distribu- 
tion, sequence, and method of recording (ran- 
dom versus column) as the independent 
variables in another 2x2X4 factorial de- 
sign. The effect of the variation in midpoint 
was again the principal source of variance 
(F = 17.73, df = 1/112, p < .001), with the 
method of recording also significant (F= 
4.65, df = 1/112, p < .05). 

The difference in adaptation level between 
the Low- and High-Midpoint, simultaneous- 
dot conditions (five categories) was also 
highly significant (¢ = 6.94, df = 53, p < 
001). The only comparison not showing 
marked variation in adaptation level with 
variation in midpoint was the Low- versus 
High-Midpoint conditions for lifted weights. 
A 3X2 analysis of variance, performed upon 
the dot, square, and weight adaptation 
levels (using only the random recording dot 
and square conditions), indicated significant 
overall effects for distribution, Low- versus 
High-Midpoint (F = 31.82, df = 1/202,p < 
.001), and also for dimension, dots versus 
squares versus weights (F =8.04, df=2/202, 
p < 01); the interaction between distribu- 
tion and dimension was not significant (F= 
1.89, df = 2/202, p > .10). This means that 
the effect upon adaptation level of the differ- 


ence in midpoint was not significantly smaller 
for the weights. 

The data are thus consistent with the hypo- 
thesis that adaptation level varies directly 
with midpoint. The least consistent values 
are the adaptation levels for the Rectangular 
distributions. Of these, only the adaptation 
level for the column recording size-judgments 
departs from the expected intermediate value 
(i.e., between the corresponding Low- and 
High-Midpoint values). The Ss under this 
condition were particularly reluctant to use 
their Very large category, with over a third 
omitting it entirely. 

Scales of Judgment. Characteristic differ- 
ences were found between the psychometric 
functions plotting the mean judgments of 
each stimulus for the Low- versus High-Mid- 
point distributions for squares and also for 
dots (both successive and simultaneous). The 
curves shown in Figure 5 are typical of these 
mean judgment functions: linear and with 
approximately equal slopes for the more fre- 


2 HIGH MIDPOINT "———* 
LOW MIDPOINT o------= 


MEAN JUDGMENT 


1 2 3 8 9 


4 5 6 
STIMULUS 


Fic. 5. Mean judgments of size for the Low- and 
High-Midpoint distributions with random recording. 


14 ParDUcct 


FS 


a 


HIGH MIDPOINT #——* 
LOW MIDPOINT &----0 


MEAN STIMULUS 


N 


| 2 3 4 5 6 
CATEGORY OF JUDGMENT 


Fic. 6. Mean stimulus value (size) to which each 
category of judgment was assigned for the conditions 
shown in Figure 5. 


quently presented stimuli, but with some flat- 
tening at the tails of the respective distribu- 
tions. Similar curves were obtained for the 
weights, but the separation between the 
curves over the middle stimulus values is less 
for the weights. The mean of the stimulus 
values to which each category of judgment 
was applied is shown in Figure 6 for the two 
groups whose mean judgments were plotted 
in Figure 5. These mean stimulus curves are 
also linear and parallel over most of the scale. 
However, this second type of scaling involves 
cutoffs at Stimuli 3 and 7 for the Low- and 
High-Midpoint distributions, respectively. 
Summary. These results provide support 
for the proportionate-subrange part of the 
postulated range-frequency compromise. At- 
tention here was restricted to the role of the 
extreme stimuli, the largest or the smallest 
of the squares, the heaviest or the lightest of 
the weights, the pattern with the most or the 


fewest dots. The results demonstrate that 
the entire scale of judgment shifts up or 
down with the values of these extreme stimuli. 
This is the first such evidence presented with 
traditional perceptual materials using the 
method of single stimuli. 

The instructions emphasized that each 
stimulus was to be judged in relation to the 
other stimuli presented for judgment. Within 
this restriction, Ss could categorize the 
stumuli in whatever ways seemed appropriate. 
Previous investigators have asserted that each 
judgment is made with reference to some 
“average” value of the context stimuli (e.g., 
the proposals of Hollingworth, Woodrow, and 
Beebe-Center, as described above in the in- 
troduction), Much of the evidence which has 
been presented in support of the theory of 
adaptation level is consistent with the gen- 
eralization that the middle of the scale of 
judgment, the adaptation level, is fixed at 
the mean of these context stimuli. But if Ss 
actually organize their scales of judgment 
around the mean, the adaptation levels and 
the mean judgment functions would have been 
identical for the various midpoint conditions 
studied in the present section; for these dis- 
tributions have identical means. The observed 
variations in judgment indicate that the mean 
is not a sufficient index to the frame of ref- 
erence for judgment. The variations imply 
that the median also fails since the medians 
and means have the same values in the various 
midpoint distributions. 

It would appear that Ss apply their cate- 
gories to equal portions of the total range of 
stimuli, the portions being measured with re- 
spect to the scale of equal discriminability as 
represented by the numerals 1-9. To predict 
the scale of judgment, one needs only the 
extreme values or end points of the frame of 
reference and the number of categories at S’s 
disposal. The higher the extreme values (or 
their average, the midpoint), the higher the 
values of the adaptation level and of each 
of the category limens, and the lower the 
values of the mean judgments applied to the 
different stimuli. 

This summary is somewhat oversimplified 
since, as shown in Figure 5, the mean judg- 
ment functions deviate systematically from 
linearity. Furthermore, the mean judgments 


} 


RANGE-FREQUENCY COMPROMISE 15 


of the stimulus extremes are not the same for 
the two distributions. It appears that the end 
categories are not always attached to the end 
stimuli. Rather, Ss tend to apply categories 
which are less extreme than those which would 
otherwise be applied to the truncated end of 
the distribution of stimuli (i.e., to use the 
second from the bottom category for the High- 
Midpoint distribution, the second from the 
top category for the Low-Midpoint distribu- 
tion). Since the frequencies of presentation of 
the stimuli are highest at the truncated ends, 
this suggests the operation of the proportion- 
ate-frequency tendency which is studied in the 
next section, 


Median 


Adaptation Level. The Low- and High-Me- 
dian distributions, with identical means and 
midpoints, differ by approximately one-stimu- 
lus step with respect to their medians, As 
shown in Table 3, the mean adaptation levels 
are consistently higher for the High-Median 
conditions, in most cases by one half of a 
stimulus step. The adaptation levels are also 
higher when the responses are recorded in 
category columns rather than in random order. 
Separate analyses of variance were performed 
for the dot, square, and weight conditions, 
respectively. The analyses followed a 2 x 2 X 4 
design for the dots and squares, with distribu- 
tion, method of recording, and sequence (four 
different orders of presentation) as the inde- 
pendent variables. The effects of sequence 
were not analyzed for weights since only two 
Ss were used under each sequence. 


The effects of distribution (Low versus 
High Median) were highly significant in each 
of the three analyses (for dots: F = 25.13, 
df = 1/128, p < .001; for squares: F = 
16.41, df = 1/128, p < .001; for weights: 
F = 7.47, df = 1/116, p < .01). The effects 
of method of recording were significant only 
for the dots (F = 19.63, df = 1/128, p 
< .001). Contrary to expectations, none of 
the interactions between the effects of stimu- 
lus distribution and the method of recording 
the judgments was statistically significant. 

The Low- and High-Median distributions of 
dot patterns were also judged by groups in- 
structed to use unrestricted, zero-centered 
scales of judgment, (with adaptation level 
computed as the mean of all stimulus values 
judged zero, i.e., neither negative nor posi- 
tive). The mean and standard deviation of 
these adaptation levels for the 37 Ss, combin- 
ing subgroups exposed to four different se- 
quences, are 4.75 and .64 for the Low-Median 
distribution, 5.16 and .77 for the High-Median 
distribution. The difference between the mean 
adaptation levels is significant (¢ = 2.41, df 
= 72, p < .02), with the respective adapta- 
tion levels very similar in value to those for 
the corresponding column recording condi- 
tions. These results add confidence concern- 
ing the generality of the effects of variation 
in the median of the distribution of stimuli 
since the procedure for determining the 
adaptation level and the specific responses 
involved are so different. 

Judgments of the numerousness of dots were 
also collected for presentations of the Low- 


TABLE 3 


VARIATION IN MEDIAN: ADAPTATION LEVELS FOR DIFFERENT STIMULUS DIMENSIONS AND 
Mernops or RECORDING 4 


Recording for dimension 


Numerousness Size Weight 
Distribution 
Random Column Random Column Experimenter Column 
N M SD N M SD N M SD N M SD N M SD N M SD 
Low Median 36 4.38 .26 36 4.67 .38 36 4.92 .59 36 5.11 .80 40 4.73 .60 20 4.67 .43 
High Median 36 4.72 .72 36 519 .54 36 5.33 41 36 5.52.50 40 5.02 91 20 5.16 .72 


a Six categories for all conditions. 


16 PaRDUCCI 


Median (dots) and High-Median (dots) dis- 
tributions (see Table 1). These had been the 
first distributions selected for investigating the 
effects of variation in median. They were not 
used for the squares and weights because of 
the relative unreliability of their adaptation 
levels (due to the relative absence of stimuli 
from the middle of the stimulus range). How- 
ever, the data collected for these two distribu- 
tions of dot patterns are presented here as 
additional evidence for the generality of the 
median effects. Only two (rather than the 
usual four) different sequences of stimuli were 
used for each distribution (one sequence being 
the reverse of the other) for the successively 
presented dots, and judgments were made in 
terms of five categories with random record- 
ing. The adaptation levels, shown in Table 4, 
differ in the expected direction (F = 3.72, df 
= 1/86, .05 < p < -10), though both adap- 
tation levels are lower than any of the other 
adaptation levels for judgments of numerous- 
ness, including the adaptation level for the 
Rectangular condition. 

The Low-Median (dots) and High-Median 
(dots) distributions were also investigated us- 
ing simultaneous presentations. As shown in 
Table 4, the differences are again in the ex- 
pected direction (¢ = 4.48, df = 60, p < 001 
for five-category judgments; £ = 4.49, dj = 
58, p < .001 for four-category judgments), 
with the Low- and High-Median adaptation 
levels bracketing the adaptation levels for the 
corresponding Rectangular condition. The 
four-category adaptation levels are of special 
interest since they represent sharp category 


shifts, each S breaking his category at a spe- 
cific stimulus value (usually where the stimu- 
lus patterns shifted to a higher, i.e., denser, 
value). As with the previously reported re- 
search on length of line (Parducci & Marshall, 
1961b), this adds confidence to the conclusion 
that the effects of median are not dependent 
upon the averaging process used in the compu- 
tation of adaptation level. For 36 additional 
five-category Ss, the page containing the dot 
patterns from the Low-Median (dots) distri- 
bution was stapled to the instruction page with 
the top-bottom orientation reversed so that the 
patterns decreased (rather than increased) in 
density, going from upper left to lower right. 
The mean adaptation level for this condition 
did not differ significantly from the adaptation 
level for the same page with the normal orien- 
tation (ż = 1.09, df = 56, p > .10), provid- 
ing no evidence for an order-on-the-page effect 
(found with younger Ss judging numerical 
magnitude, Parducci et al., 1960). 

Scales of Judgment. The means of the 
stimuli judged with each of the categories are 
shown in Figure 7 for the random recording, 
square-judging groups. Similar functions were 
obtained for judgments of size, numerousness, 
and weight, both for random recording (or 
experimenter recording with weights) and for 
column recording. Each function is somewhat 
more linear for the random recording than 
for the column recording, the reduction in the 
distribution differences being greatest for 
weight and least for size. For all six pairs of 
conditions (i.e., Low versus High Median), 
the mean stimulus for four of the six categories 


TABLE 4 


VARIATION IN MEDIAN: ADAPTATION LEVELS FOR JUDGMENTS OF NUMEROUSNESS 


Presentation category 


Successive Simultaneous 
Distribution 
Five Five Four 
N M SD N M SD N M SD 
Low Median (dots) 42 3.94 87 36 4.10 91 30 4.25 17 
High Median (dots) 48 4.34 1.10 26 5.21 1.04 30 5.35 1.10 
Rectangular 43 4.69 64 29 4.71 Al 30 4.88 89 


i nea 


RancE-FREQUENCY COMPROMISE 17 


+ 


ol 


HIGH MEDIAN &———* 
LOW MEDIAN &-----o 


MEAN STIMULUS 


| 2 3 4 5 6 
CATEGORY OF JUDGMENT 
Fic. 7. Mean stimulus value (size) to which each 


category of judgment was assigned for the Low- and 
High-Median distributions with random recording. 


is higher for the Low-Median condition. How- 
ever, in each of the comparisons, the means 
of the stimuli judged “3” and “4” are higher 
for the High-Median condition. Since adap- 
tation level is defined in terms of these middle 
categories, the differences in adaptation level 
are actually in the opposite direction from that 
of the mean stimulus differences obtained for 
the other categories. 

The mean judgments of the middle stimu- 
lus, 5, were also tabulated for the two condi- 
tions shown in Figure 7. The mean and 
standard deviation of these judgments are 
3.60 and .40 for the Low-Median distribution, 
and 3.22 and .37 for the High-Median distri- 
bution. The direction of this difference is 
consistent with the computed difference in ad- 
aptation level (i.e., lower adaptation level 
goes with higher judgments) and is highly 
significant (t = 4.63, df = 84, p < .00l). 
Thus in spite of the intertwining relationships 


between these scales of judgment, the differ- 
ences in the median are accompanied by 
marked differences in both the centering of 
the scales (adaptation level) and in the judg- 
ments of the middle stimulus of the distribu- 
tions. 

However, adaptation level is an incomplete 
and somewhat unrepresentative index of the 
scale of judgment for the Low- and High-Me- 
dian distributions of stimuli. The peculiarly 
complex relationship between these two scales 
of judgment is somewhat clarified by the hy- 
pothetical fixed-frequency functions plotted in 
Figure 8. These functions were based on the 
actual frequencies with which the respective 
categories of judgment were used by the 33 
Ss judging the Rectangular distribution of 
squares under random recording. The mean 
tabulated percentages of total use, going from 
Very small to Very large, were as follows: 


3 HIGH MEDIAN #——® 
LOW MEDIAN >----- 


HYPOTHETICAL MEAN STIMULUS 


2 3 4 5 
CATEGORY OF JUDGMENT 


Fic. 8. Hypothetical mean stimulus value for the 
conditions shown in Figure 7, assuming that each 
category was used with the same frequency as for the 
Rectangular distribution of squares with random 
recording. 


18 Parpuccr 


12.7, 19.4, 19.2, 20.7, 20.3, and 7.7. The 
functions in Figure 8 represent what the 
means of the stimuli judged with each category 
would have been if Ss judging the Low- and 
High-Median distributions used the categories 
with the same frequencies as the categories 
had been used by Ss judging the Rectangular 
distribution (also assuming that each S kept 
a perfectly ordinal scale), Each of the ob- 
tained differences plotted in Figure 7 appears 
in exaggerated form in Figure 8. Had Ss used 
their categories merely to mark off proportion- 
ate subranges, the two functions in Figure 7 
would actually have differed in the opposite 
direction (determined by constructing a sec- 
ond pair of hypothetical mean stimulus func- 
tions showing what the mean stimulus would 
have been for each category had the category 
limens been the same for these conditions as 
for the corresponding Rectangular condition) . 
The mean stimulus functions shown in Figure 
7 are thus consistent with the hypothesis that 
Ss tend to use their categories of judgment 
with fixed relative frequencies. 

Mean judgments of each of the nine stimuli 
were tabulated for each S judging the Low- 
and High-Median distributions using the un- 
restricted, zero-centered scales of judgment. 
These values were then rescaled to equate Ss 
with respect to range of judgment in accord- 
ance with the following procedure: each S’s 
mean judgment of Stimulus 1 was subtracted 
from the mean of his judgments of Stimulus 9 
and then one half this value (ie., one half 
the range of his mean judgments) was used 
as the denominator of nine different ratios 
with his mean judgments of the respective 
stimuli as the numerators. The means (across 
Ss) of these ratios are plotted in Figure 9. 
The greatest difference between the two func- 
tions in Figure 9 is toward the middle of the 
range of stimuli, the mean judgment being 
lower for the High-Median condition. This is 
consistent with the difference in adaptation 
level reported above (since judgment and ad- 
aptation level are, by definition, negatively 
correlated, at least for the middle of the scale 
of responses). However, the marked crossing- 
over of the functions shown in Figure 7 does 
not appear in Figure 9. A better basis for 
evaluation of the effects of the response scales 
is provided in Figure 10 which plots the mean 


% 


+1.00 


+.60 


MEAN JUDGMENT 
o 
— 


-40 


=.60 HIGH MEDIAN @——e 
LOW MEDIAN © =-= -0 


=1.00 
; i 


STIMULUS 


Fıc. 9. Magnitude estimations of numerousness for 
the Low- and High-Median distributions using un- 
restricted scales, centered at zero. 


judgments for the corresponding Low- and 
High-Median dot conditions under six-cate- 
gory, random recording. The difference in the 
magnitudes of the successive differences be- 
tween the functions in Figure 10 are similar 
to those in Figure 9, but somewhat greater. 
There is also a marked similarity in the gen- 
eral form of the mean stimulus and mean judg- 
ment functions (Figures 7 and 10). However, 
the mean judgment functions provide a sim- 
pler basis for evaluating the proportionate- 
frequency hypothesis since the mean judg- 
ments, unlike the mean stimulus values, would 
have been identical for the Low- and High- 
Median distributions if Ss had used their 
categories of judgment merely to divide the 
range into the six subranges obtained for the 
Rectangular distribution. 

Mean judgment functions (not shown) 
were computed for the Low- and High-Median 
distributions of dots judged using the two 
different methods of recording and also for 
the two corresponding pairs of square-judging 


RANGE-FREQUENCY COMPROMISE 19 


MEAN JUDGMENT 


HIGH MEDIAN @———® 
LOW MEDIAN O-----O 


i} 2 3 4 5 6 7 8 9 
STIMULUS 
Fic. 10, Mean judgments of numerousness for the 


Low- and High-Median distributions with random 
recording. 


conditions, All four pairs of functions are 
strikingly similar; though, for both dimen- 
sions, the magnitudes of the differences are 
slightly greater for the column recording func- 
tions. Hypothetical fixed-frequency functions, 
analogous to the functions shown in Figure 8, 
were constructed to show what the mean judg- 
ments of the squares would have been if the 
six categories had been used with the same fre- 
quencies as for the Rectangular distribution 
judged with the corresponding method of re- 
cording. This could not be done for the dots 
since only random recording had been used 
for the Rectangular condition for dots, and 
then with five rather than six categories. It 
had been assumed that these hypothetical 
functions would be very similar for the two 
methods of recording (i.e., judgments of the 
Rectangular distribution would not be affected 
by the method of recording). However, the 
baseline frequencies (i.e., for the two meth- 
ods of recording judgments of the Rectangular 


distribution) differed systematically so that 
the hypothetical fixed-frequency functions also 
differed—the hypothetical differences asso- 
ciated with distribution actually being some- 
what greater for random recording. Since it 
had been hypothesized that the empirical dif- 
ferences would be greater for column record- 
ing, the baseline differences worked against 
the hypothesized recording effects as repre- 
sented in the mean judgment functions, How- 
ever, both pairs of the fixed frequency func- 
tions (not shown but similar to Figure 18) 
exaggerate the respective differences between 
the actual mean judgment functions in ac- 
cordance with the theory of a range-frequency 
compromise. Additional hypothetical mean 
judgment functions, constructed by averaging 
the respective values on the hypothetical 
fixed-frequency functions and the actual mean 
judgment functions for the Rectangular con- 
ditions (the latter representing the hypo- 
thetical proportionate-subrange tendency), 
of course provide a considerably closer fit to 
the actual mean judgments for the correspond- 
ing Low- and High-Median conditions, 

The Low- and High-Median distributions 
of squares were also presented using the back- 
ground-1,000 conditions (in which Ss were 
instructed to compare each square with the 
background). The mean of each S’s judg- 
ments of each stimulus was computed, and 
the medians of the means were determined 
for both distributions (again combining sub- 
groups exposed to four different sequences). 
However, the variability between Ss was ex- 
traordinary (e.g, standard deviations for 
Stimulus 5 were over one half the difference 
between the median judgment of Stimulus 9 
and the median judgment of Stimulus 1), and 
the large absolute differences in the judgments 
of the same stimulus for the two distributions 
(lower for the High-Median condition) were 
not statistically significant. Following the 
procedure used by Stevens (1956) to equate 
for individual differences in range (or modu- 
lus), each S’s nine mean judgments were mul- 
tiplied by a constant so that the corrected 
judgment of Stimulus 9 was 1,000 for each S. 
The medians of these positively skewed cor- 
rection factors were 9.5 and 12.1 for the Low- 
and High-Median conditions, respectively, 
with standard deviations of 17.9 and 16.6 


20 Parpuccr 


700 


600 


500 


HIGH MEDIAN 
200 i LOW MEDIAN 


Ae 
o-—==-0 


MEDIAN JUDGMENT 
è 
o 


4 5 6 T 8 9 
STIMULUS 


Fic. 11. Magnitude estimations of size for the Low- 
and High-Median distributions with 1,000 anchored 
to the background (rescaled so that Stimulus 9 is 
judged 1,000). 


(the two distributions of correction factors 
were very much alike). The medians of 
the corrected judgments are plotted in Fig- 
ure 11. Again, the relative differences in 
judgments for the: two distributions follow 
the intertwining pattern of Figure 10 (more 
clearly shown if each Low-Median judgment 
were multiplied by a constant so that the 
functions crossed near Stimulus 6). The mean 
and standard deviation of the corected judg- 
ments of Stimulus 5 were 536 and 154 for the 
Low-Median condition, and 439 and 132 for 
the High-Median condition (¢=2.58, df=57, 
p < .02). Thus even when the scale of judg- 
ment is anchored to a stimulus which is con- 
stantly present, the magnitude estimations 
shift systematically with the median of the 
stimulus distribution. 

To establish the significance of the inter- 
twining relationship between the two func- 
tions in Figure 11, each S’s corrected mean 
judgments of the nine stimuli were multiplied, 


respectively, by the nine coefficients of the 
orthogonal-polynomial values for the quad- 
ratic component (ie., +28, +7, —8, —17, 
—20, —17, —8, +7, and +28, as pre- 
sented in Fisher & Yates, 1957, p. 90). The 
quadratic component of each S’s mean judg- 
ment function was obtained by summing these 
nine products. The sums were significantly 
more positive for the High-Median condition 
(t = 2.87, df = 57, p < .01). The difference 
between the quadratic components of the 
functions in Figure 10 were also significant 
(t = 11.25, df = 71; p < .001). 

The modified method of magnitude esti- 
mation in which S estimates the actual num- 
ber of dots was used for the Low- and 
High-Midpoint and Low- and High-Median, 
successive-dot conditions. Forty different Ss, 
10 under each of four different sequences, 
were used for each of the four distributions. 
The mean estimates (the median estimates 
were so similar to the means that they are 
not shown) of the number of dots in each of 
the respective patterns are plotted on loga- 
rithmic coordinates in Figure 12. It is clear 
that there are no large systematic differences 
between the estimates as a function of the 
distribution of stimuli. Even the difference 
between the extreme mean estimates for the 
38-dot pattern, 2.31 dots higher for the Low- 
Midpoint than for the High-Midpoint dis- 


122 


18 3 HIGH MIDPOINT @ 
LOW MIDPOINT O 
HIGH MEDIAN X 
LOW MEDIAN O 


ESTIMATED NUMBER OF DOTS 
= Y 
N N 


8 12 
NUMBER OF DOTS 


18 27 38 53 7295122 


Fic. 12. Log-log plot of mean physical estimations 
against actual number of dots for four distributions 
with different midpoints and medians. 


RancE-FREQUENCY COMPROMISE 21 


tribution, is not significant (t=1.66, df=78, 
p=-10). The larger mean differences for the 
higher values are accompanied by larger 
within-condition variability so that they are 
not to be taken seriously either. Thus the 
marked context effects obtained for these dis- 
tributions with the category scales and also 
with the unrestricted numerical scales, either 
0-centered or background-1,000, do not occur 
when the scale of judgment is completely an- 
chored to a fundamental physical measure 
(actual number of dots). 

It may also be inferred from Figure 12 
that there are no systematic or constant 
errors in the estimates of the lower stimulus 
values (the patterns with 8, 12, 18, or 27 
dots) but that Ss tend to underestimate the 
number of dots in the denser patterns. The 
curvilinear form of these psychometric func- 
tions thus deviates from the linear log-log 
(power) function reported by Stevens and 
Galanter (1957). The present stimulus ex- 
posures were of sufficient duration (6 sec.) 
to permit some actual counting of the dots 
and complete counting for the eight-dot pat- 
tern. Power functions might be more closely 
approximated by assuming that the exponent 
shifts for patterns with more than 53 dots. 

Estimated Number of Stimuli. After the 
final (ie., ninetieth) judgment, Ss in these 
four physical-estimation conditions were in- 
structed to record the number of different pat- 
terns which they recalled having been shown. 
Individual estimates varied from 4 to 45 
patterns, with fewer than 10% of the esti- 
mates being exactly correct (i.e., either 7 or 
9 patterns, depending on the distribution). 
For each of the four distributions, the median 
estimates were within 1.0 of the actual num- 
ber of patterns, with no indication of system- 
atic error. 

Summary. The results of this section add 
balance to the evidence for the range-fre- 
quency compromise in judgment. First, they 
demonstrate that a complete description of 
the frame of reference for judgment must in- 
clude much more than the values of the two 
extreme stimuli. Even when different distri- 
butions of stimuli share common end values, 
as was the case for each pair of distributions 
studied in this section, the scale of judgment 
shifts systematically with variation in the 


relative frequencies with which the different 
stimuli are presented. Nor can the observed 
variation be attributed to a pooled effect of 
all the stimuli, as represented by the mean, 
for the means were always identical for each 
pair of conditions compared in this section. 
The median was the only measure of central 
tendency which varied. In accordance with 
the experimental hypothesis, adaptation level 
varied directly (and the mean judgment of 
the middle stimulus varied inversely) with 
variation of the median value of the distribu- 
tion of stimuli, This is the first such demon- 
stration of the median effect using traditional 
perceptual materials and the method of single 
stimuli. 

It is clear that the median has considerable 
predictive value for all three perceptual di- 
mensions. However, the results of this sec- 
tion indicate that the frequency portion of 
the range-frequency compromise gets its most 
impressive support from a more complete con- 
sideration of the effects of local variation in 
frequencies over the range of stimuli. In com- 
paring mean judgments for corresponding 
pairs of conditions, the most striking feature 
of the response scales is their peculiar cur- 
vilinearity (i.e. they intertwine instead of 
being parallel). The middle judgments vary 
in the expected direction as a function of the 
difference in the median, but the direction of 
the differences at other values is usually in 
the opposite direction. This intertwining of 
scales is inconsistent with the general law of 
judgment for the theory of adaptation level 
which assumes that judgments are linearly 
related to the differences between the judged 
stimuli and adaptation level. Instead, the 
curvilinearity follows the form which would 
have been obtained if the categories were 
always used with the same relative frequen- 
cies (i.e., the frequencies obtained when the 
stimuli were presented with equal frequen- 
cies). The fact that the observed curvi- 
linearity is never as marked as for the hypo- 
thetical proportionate-frequency functions 
adds further support to the theory of a range- 
frequency compromise. If Ss simply divided 
the range into equal subranges, the mean 
judgment functions would be linear. 

Although the interactions between distribu- 
tion and method of recording are never great, 


22 Parpucctr 


they are in general consistent with the hy- 
pothesis that the effects of the differences in 
median would be more pronounced when Ss 
had a continuous record of their response fre- 
quencies and could thus more readily display 
the hypothesized tendency to use their cate- 
gories with fixed relative frequencies. The ex- 
pected interactions are clearest when the en- 
tire scale of judgment (rather than just 
adaptation level) is considered. The curvi- 
linearity and the differences within each cor- 
responding pair of distributions is greater 
when Ss have the record of their response 
frequencies. 

Surprising evidence for the generality of 
the proportionate-frequency tendency is pro- 
vided by the results with numerical scales. 
The general forms of the mean judgment 
functions for zero-centered conditions (i.e., 
when Ss were instructed to call the average 
stimulus “zero”) resemble those obtained 
when Ss were restricted to six verbally defined 
categories. Since the zero-centered instruc- 
tions placed no restrictions on the number of 
different values which Ss could use (they in 
fact use an average of almost three times as 
many different responses), it appears that 
the proportionate-frequency tendency applies 
to regions of the response scale and not just to 
frequency of use of a stipulated set of cate- 
gories. Narrower-than-average subranges of 
the total scale of judgment are applied to 
those stimuli presented with greater-than- 
average frequency. The same conclusions can 
be drawn from the judgments obtained using 


Stevens’ magnitude-estimation instructions 
(with the numerical scale anchored to the 
physical size of the background). It thus ap- 
pears that magnitude estimations reflect the 
range-frequency compromise in much the 
same way as do the judgments made in terms 
of absolute, verbal categories. It was only 
when Ss were instructed to estimate the actual 
number of dots that no evidence was obtained 
for a range-frequency compromise. 


Mean 


Adaptation Levels and Scales of Judgment. 
Four different pairs of distributions were used 
to assess the effects of variation in mean, with 
the median and midpoint of the stimulus 
values held constant. The Low- and High- 
Mean distributions were used for judgments 
of successively presented patterns of dots 
(with four different orders of presentation 
for each distribution), for judgments of simul- 
taneously presented patterns of dots, and also 
for judgments of lifted weights (again differ- 
ent orders for each pair of Ss). Only the 
random recording (with five categories of 
judgment) was used for these two distribu- 
tions. As shown in Table 5, adaptation levels 
are not systematically affected by the varia- 
tion in mean. Somewhat different distribu- 
tions were used for the judgments of size, 
Low Mean (size) and High Mean (size); 
and these judgments were made using column 
recording (with six categories of judgment), 
Again the differences in adaptation level are 


TABLE 5 


VARIATION IN MEAN; ADAPTATION LEVELS FOR DIFFERENT STIMULUS DIMENSIONS AND . 
METHODS OF PRESENTATION 


Presentation dimension 


Numerousness Size ® Weight 
Distribution 
Successive Simultaneous Successive Successive 
N M SD N M SD N M SD N M SD 
Low Mean 41 4.44 .82 29 4.85 66 28 5.27 val 40 4.82 1.07 
High Mean 41 4.37 86 27 4.93 93 28 5.37 86 40 4.79 1.05 


a Six-category, column recording for Low- and High-Mean (size) conditions; all others used five-category, random recording. 


RanGE-FREQUENCY COMPROMISE 23 


insignificant. The mean judgments of the 
respective squares under this condition are 
shown in Figure 13. The difference between 
the mean judgments of the middle stimulus 
(5) is actually in the opposite direction from 
what one would expect on the basis of the 
direction of the difference in adaptation level 
(the correlation between judgment and 
adaptation level usually being negative), but 
this difference is small. Some of the other dif- 
ferences are clearly significant (e.g., for Stimu- 
lus 2); and these are consistent with the hy- 
pothesis that Ss tend to use their categories 
with fixed relative frequencies. With only 
three judgments of each of the three smallest 
squares, the High-Mean (size) Ss judge 
Stimulus 2 Very small slightly more often than 
they judge it Small. However, the Low-Mean 
(size) Ss were shown the smallest square al- 
most four times as often, so that they tend to 
exhaust their use of the Very small category 
on Stimulus 1 and thus judge Stimulus 2 


MEAN JUDGMENT 


2 HIGH MEAN (SIZE) s——e 
LOW MEAN (SIZE) >------0 


I, US S ine. aS NS amet pen) 19) 
STIMULUS 


Fic. 13, Mean judgments for the Low- and High- 
Mean (size) distributions with column recording. 


Small. Similarly, the Ss judging the High- 
Mean (size) distribution are presented the 
largest square (9) so much more frequently 
that they tend to use lower categories for 
Stimuli 8 and 9. As with the other distribu- 
tions of squares, reported in the previous sec- 
tions, there is a marked tendency to omit the 
Very large category. Perhaps, none of the 
squares seem Very large because the still 
larger border is so prominent in S’s visual 
field. 

The frequencies with which the six cate- 
gories were used for the Rectangular distribu- 
tion of squares, with column recording, served 
as the baseline for the construction of hypo- 
thetical fixed-frequency functions for the 
Low-Mean (size) and High-Mean (size) con- 
ditions. The procedure followed that de- 
scribed above for the Low- and High-Median 
distributions of squares. The resulting hypo- 
thetical mean judgment functions (not 
shown) differ markedly from each other, exag- 
gerating for eight of the nine stimuli the 
empirical differences plotted in Figure 13. 
The hypothetical functions come together only 
for Stimulus 5, which is the stimulus for which 
the direction of the differences between the 
empirical functions is reversed. Thus, the 
judgments may reflect the proportionate-fre- 
quency tendency even when the median and 
midpoint are held constant and the variation 
in mean does not affect adaptation level. 

Two other pairs of distributions (differing 
in mean but not in midpoint and median) 
were also studied for the successively pre- 
sented dots, using random recording and five 
categories of judgment. Only two, rather than 
the usual four, orders of presentation were 
used for each distribution. The mean and 
standard deviation of the adaptation levels are 
4.70 and .57 for Low Mean (dots), and 5.02 
and .69 for High Mean (dots). Although the 
difference in adaptation level is greater for 
these two distributions, and also in the same 
direction as the difference in the means of the 
stimuli, the difference is again not significant 
(F = 2.83, df = 1/40, p = .10). 

The mean and standard deviation of the 
adaptation levels are 5.06 and .48 for Low 
Mean (gap), 4.37 and .61 for High Mean 
(gap). This difference is significant (t = 4.83, 
df = 1/60, p < .001) but in the direction 


24 Parpucct 


MEAN JUDGMENT 


HIGH MEAN (GAP) #———* 
LOW MEAN (GAP) © 


4 5 6 7 8 9 
STIMULUS 


Fic, 14. Mean judgments of numerousness for the 
Low- and High-Mean (gap) distributions with ran- 
dom recording. 


opposite to that derived from adaptation level 
theory; it is also reflected in the mean judg- 
ments of the middle stimulus (5), 2.98 for 
Low Mean (gap) and 3.26 for High Mean 
(gap). These distributions differ from the 
others reported in this study in that gaps have 
been introduced, the 12- and 18-dot patterns 
having been omitted for the Low-Mean (gap) 
distribution, the 72- and 95-dot patterns being 
omitted for the High-Mean (gap) distribution. 
Insofar as Ss tend to use all available cate- 
gories, the adaptation levels would tend to 
shift away from the gap. Thus, as shown in 
Figure 14, Ss almost always judged the 122- 
dot pattern (9) to contain Very many dots 
(“5”), But since the next stimulus down in 
the High-Mean (gap) distribution was the 
53-dot pattern (6), Ss judging this distribu- 
tion had to call Stimulus 6 Many (“4”) if 
they were to use their fourth category at all. 
Thus the High-Mean (gap) Ss were more 
likely to judge Stimulus 6 Many but less likely 
to judge Stimulus 4 Few. 

To test this interpretation, additional 
groups were exposed to the Low-Mean (gap) 
and High-Mean (gap) distributions under the 


same conditions but with the restriction that 
they use only two categories of judgment 
(Few and Many) rather than five. The new 
groups were thus free of the hypothesized 
tendency to use categories which would or- 
dinarily be applied only to the missing stimuli 
(ie., the gap). The resulting adaptation levels 
were computed by linear interpolation on each 
S’s psychometric function relating the per- 
centage of Many judgments to the stimulus 
values (with adaptation level as the 50% 
point). Inspection of the frequency distribu- 
tions of these two-category adaptation levels 
reveals that unlike the distributions of adap- 
tation levels for the conditions previously 
reported, both of these clearly deviate from 
normality and also from even an approximate 
approach to continuity. Both frequency dis- 
tributions are bimodal; 8 of the 29 Low-Mean 
(gap) Ss broke their scale of judgment at the 
gap (i.e., between Stimuli 1 and 4), and the 
limen for 8 of the 28 High-Mean (gap) Ss was 
also at the gap in their distribution of stimuli 
(i.e., between 6 and 9), When these “gap” Ss 
are excluded from the analysis, the means and 
standard deviations of the adaptation levels 
for the remaining Ss are 5.34 and .56 for Low 
Mean (gap), 5.08 and .67 for High Mean 
(gap) (¢ = 1.35, df = 39, p > .10). The re- 
duction in difference between adaptation 
levels for this portion of the data is thus con- 
sistent with the special interpretation of the 
large difference between Low-Mean (gap) and 
High-Mean (gap) adaptation levels found for 
the five-category conditions. However, the 
fact that the limen for approximately one 
fourth of the two-category Ss is at the gap in- 
dicates that the range-frequency compromise 
is inadequate to explain the judgments for this 
kind of distribution. The respective gaps are 
far removed from either the midpoint or the 
median, but these sizable subgroups of Ss 
nevertheless use their categories as if to 
specify the location of the gaps. 

Summary. The conditions described in this 
section were designed to provide direct evi- 
dence for the pooling feature of the theory of 
adaptation level. Insofar as adaptation level 
is a weighted average of the various stimulus 
values which form the context for judgment, it 
should be sensitive to variations in the stimu- 
lus distribution which are not specified by 


RANGE-FREQUENCY COMPROMISE 25 


either the midpoint or the median. In previ- 
ous applications of the theory to situations of 
the present type (Helson, 1959; Johnson, 
1955), the various stimuli have been weighted 
only with respect to their relative frequency 
of presentation. This makes adaptation level 
a linear function of the mean of the stimuli 
presented for judgment. The theory was 
tested in this section by assessing the shifts in 
adaptation level associated with variation in 
the mean, keeping the midpoint and the 
median constant. 

With no variation in either the midpoint or 
median, the postulated range-frequency com- 
promise should yield no variation in adapta- 
tion level, except insofar as inequalities in the 
hypothetical category frequencies and crude- 
ness in the operational definition of adapta- 
tion level would produce minor variations in 
this dependent variable. However, the pro- 
portionate-frequency tendency should produce 
a curvilinear relationship between the judg- 
ments and stimulus values when the latter are 
presented with markedly unequal frequencies. 
Even if the adaptation levels were identical for 
two such distributions, there should be dif- 
ferences in the stimuli to which judgments 
from other portions of the response scale were 
applied. 

In spite of the fact that a number of differ- 
ent pairs of distributions and three different 
perceptual dimensions were studied, the dis- 
tributions in each pair differing markedly with 
respect to their means, no systematic rela- 
tionship was found between mean and adapta- 
tion level (the one significant difference in 
adaptation level being in the direction opposed 
to the prediction from the theory of adapta- 
tion level). This feature of the data thus 
provides no support for the theory of adapta- 
tion level, but it is consistent with the range- 
frequency compromise. The postulated com- 
promise gets more positive support from the 
analysis of the mean judgment functions. The 
obtained differences in mean judgment again 
reflect, in reduced magnitude, the differences 
which would have been obtained if Ss had used 
their categories with the same frequencies as 
they had been used when stimuli were pre- 
sented with equal frequencies. 

The effects of introducing a gap into the 
distribution of stimuli require additional as- 


sumptions about the rules of judgment. Al- 
though gaps are present in only two of the 
distributions, it seems clear that Ss sometimes 
use their categories as if the task were to 
locate the position of the break in the distribu- 
tion of stimuli. When restricted to two cate- 
gories, many Ss applied one category to six of 
the seven stimuli—even though this also 
meant that the same category was applied on 
two thirds of the presentations and to almost 
two thirds of the total range of stimuli. Simi- 
lar, but less extreme, departures from the 
range-frequency compromise occur with five 
categories of judgment. 


Restriction of Range 


Following the presentation and judgment of 
the six E recording, lifted weight conditions 
described in the preceding sections (i.e., Low 
versus High Midpoint, Median, and Mean), 
20 Ss under each condition were given 50 ad- 
ditional presentations ofthe five lightest 
weights (i.e., 10 each of 1, 2, 3, 4, and 5), 
randomized in blocks of five. These shifts in 
the distribution of stimuli were not called to 
S’s attention, and there was no break in the 
presentation sequence at the point of shift. 
The effects of somewhat similar restrictions of 
range had previously been studied for judg- 
ments of visual size and tonal pitch, the results 
indicating that Ss do not readjust their scales 
of judgment unless the missing values occur 
frequently during the preshift sequence (Par- 
ducci, 1956a). For the present Median and 
Mean conditions, the shift represented a re- 
striction of range (the four missing values 
having been presented only infrequently in the 
Low-Median condition) with no addition to 
the preshift distribution. However, the shift 
included the addition of new stimuli (1 and 2) 
as well as a restriction of the range for the 
present High-Midpoint condition. 

Marked readjustment of the scale of judg- 
ment was obtained for all six conditions. The 
mean judgments of Stimulus 5 increase ap- 
proximately .70 of a category step for both the 
Low- and High-Midpoint conditions, reach- 
ing an asymptote early in the postshift se- 
quence, with no systematic convergence of the 
respective means (higher for the Low-Mid- 
point condition). These shifts in judgment are 


26 PARDUCCI 


comparable in magnitude to the greatest of the 
previously reported shifts following restriction 
of range. Virtually the same description ap- 
plies to the shifts for the Low- and High-Mean 
conditions. The shifts are even larger for the 
Low- and High-Median conditions, and this 
increase may be associated with the use of six 
categories of judgment (rather than the five 
categories used for the other weight condi- 
tions). However, the shifts in stimulus dis- 
tribution for the Median conditions are most 
similar to the shifts in distribution which had 
previously resulted in no readjustment in the 
scale of judgment (e.g., the P-88 Condition in 
Parducci, 1956a). Although the present shifts 
in stimulus distribution are in no case iden- 
tical to the previous shifts, the general trend of 
the results suggests that considerably greater 
readjustment of the scale of judgment is ob- 
tained using the dimension of lifted weight 
than following similar shifts in distributions of 
visual and auditory stimuli. 

The greater adjustment to the postshift dis- 
tribution for lifted weights may reflect weaker 
anchoring of the extreme stimuli. The values 
of the end stimuli should be relatively easy to 
remember when they are squares judged 
against a fixed background or even when they 
are tones varying markedly in pitch. However, 
the remembered value or PSE for the heaviest 
weight rapidly shifts downward, toward the 
other values in the distribution of weights 
(Parducci & Marshall, 1962), so that there 
should be a postshift adjustment of the judg- 
ments. Both the theory of adaptation level 


and the frequency portion of the present com- 
promise theory predict postshift adjustment 
for all dimensions. The relative absence of 
postshift adjustments to restriction of range 
with more easily anchored dimensions appears 
consistent with the range portion of the com- 
promise theory, the preshift range continuing 
to fix the scale of judgment for such dimen- 
sions. E- 


Midpoint and Median 


This section describes the results of varying 
both the midpoint and median, in various 
combinations, with the mean held constant. As 
shown in Table 1, the midpoints of the Low- 
and High-Midpoint-Median distributions dif- 
fer by one stimulus step; their medians differ, 
in the same direction, by somewhat less than 
one step (i.e., four of the five Low-Midpoint- 
Median presentations of Stimulus 5 would 
rank above the twenty-third stimulus if the 
45 presentations in the set were ordered in 
terms of decreasing magnitude; this would be 
true for only one of the five High-Midpoint- 
Median presentations of Stimulus 5). Table 6 
shows the adaptation levels obtained under 
various conditions for these two distributions. 

Separate analyses of variance were per- 
formed for the judgments of size and numer- 
ousness. For size, the analysis followed a 2 x 
2x4 design, with distribution (Low- versus 
High-Midpoint-Median), method of record- 
ing (random versus column), and sequence 
(four different orders of presentation) as the 


TABLE 6 


VARIATION IN MIDPOINT AND MEDIAN: ApAPTATION LEVELS FOR DIFFERENT STIMULUS DIMENSIONS 
AND METHODS or RECORDING 


Recording ® for dimension 


Numerousness Size 
Distribution — 
Random Random Column 
N M SD N M SD N M SD 
Low-Midpoint-Median 24 4.36 38 32 5.01 61 32 4.98 41 
High-Midpoint-Median 24 467 17 32) E TSA ONSA SE SOE 50 


a Six categories for all conditions. 


RANGE-FREQUENCY COMPROMISE 27 


independent variables. The effects of distribu- 
tion are in the expected direction, i.e., higher 
adaptation level for the higher midpoint- 
median combination, and statistically signifi- 
cant (F = 27.83, df = 1/112, p < .001); 
but these distribution effects are not inde- 
pendent of the stimulus sequence (F = 4.43, 
df = 3/112, p < .01, for the interaction be- 
tween distribution and sequence). This was 
the only clear evidence obtained in any of the 
analyses that ordinal effects interact with the 
variable of major interest. Confidence in the 
generality of the present distribution effects is 
reinforced by the fact that adaptation level is 
lower for each of these eight Low-Midpoint- 
Median groups than for the High-Midpoint- 
Median group exposed to the corresponding 
sequence of presentation (i.e., sequence se- 
lected according to the mirror-image procedure 
described under the Order of Presentation 
section, above). None of the other conditions, 
method of recording, sequence, or the inter- 
actions, had effects even approaching statis- 
tical significance. 

The Low- and High-Midpoint-Median dis- 
tributions (for judgments of numerousness) 
were studied only with random recording and 
with three (instead of the usual four) se- 
quences of presentation. The distribution ef- 
fects were again in the expected direction 
(F = 3.10, df = 1/42, .05 < p <.10). Here, 
the interaction between distribution and se- 
quence did not approach significance (F < 
1.0). 

To assess the effects of variation in median 
with more regular distributions, two additional 
sets of squares (Low-Midpoint, High-Median 
and High-Midpoint, Low Median) were pre- 
sented for judgments of size using column re- 
cording. As shown in Table 1, the midpoint 
of the Low-Midpoint, High-Median distribu- 
tion is the same as the midpoint for the Low- 


" Midpoint-Median distribution; and the mid- 


point for the High-Midpoint, Low-Median 
distribution is the same as for the High-Mid- 
point-Median distribution. The medians for 
these two additional distributions correspond 


. more closely to the medians for the distribu- 


tions with the opposite midpoints (i.e., ap- 
proximately half the stimuli from both the 
High-Midpoint, Low-Median and Low-Mid- 
point-Median distributions are smaller than 


Stimulus 5, and half the stimuli from both the 
Low-Midpoint, High-Median and High-Mid- 
point-Median distributions are larger than 
Stimulus 5). With 32 Ss in each of these two 


-new conditions (four subgroups, of 8 Ss each, 


judging different sequences of squares), the 
new means and standard deviations are 5.17 
and .79 for High-Midpoint, Low-Median; 
and 5.54 and .49 for Low-Midpoint, High- 
Median. All four of these Midpoint-Median 
square-judging conditions were combined into 
a 2x2 factorial design, with midpoint and 
median as the independent variables. Analysis 
of variance indicated that only the effects 
of median are significant (F = 19.46, df 
= 1/124, p < .001). 

The relationships between these four adap- 
tation levels for column recorded judgments 
of size correspond closely to the relationships 
between the adaptation levels obtained for 
similar distributions of simultaneously pre- 
sented lines, judged in terms of six categories 
of length (Parducci & Marshall, 1961b). The 
only difference between corresponding dis- 
tributions of stimuli for the square and line 
conditions is that where the same square is 
repeated (e.g, Stimulus 1 is presented five 
times in the Low-Midpoint, High-Median 
distribution of squares), multiple lines were 
drawn from the same subrange of length (e.g., 
five lines between 10 and 30 mm. in the Low- 
Midpoint, High-Median distribution of lines). 
Averaging across the alternative methods 
of presentation for the lines, the obtained 
adaptation levels are almost equal for the 
Low-Midpoint, High-Median and High-Mid- 
point-Median conditions, and the adaptation 
level for High Midpoint, Low Median is 
slightly larger than the adaptation level for 
Low-Midpoint-Median—just as for the pres- 
ent square-judging conditions. This congru- 
ence across stimulus dimensions (squares 
versus lines), method of presentation (succes- 
sive versus simultaneous), and method of de- 
termining adaptation level (mean of stimuli 
judged with two middle categories versus mid- 
dle limen indicated by abrupt shift in cate- 
gories) , suggests that these particular distribu- 
tions have special features which counteract 
the expected variation in adaptation level in 
association with the differences in midpoint. 

The mean judgments for the four square- 


28 PARDUCCI 


MEAN JUDGMENT 


HIGH MIDPOINT- 9 a 
MEDIAN 


LOW MIDPOINT- o- 
MEDIAN RAA i 


STIMULUS 
Fic. 15. Mean judgments of size for the Low- and 


High-Midpoint-Median distributions with column 
recording. 


judging, column recording conditions included 
in the preceding analysis are presented in 
Figures 15 and 16. The psychometric func- 
tions for the Low-Midpoint-Median and 
High-Midpoint-Median distributions (Figure 
15) resemble, respectively, those shown for 
the Low-Midpoint and High-Midpoint dis- 
tributions (Figure 5); and the functions for 
Low-Midpoint, High-Median and the High- 
Midpoint, Low-Median distributions (Figure 
16) show the same double crossings-over 
shown for the Low-Median and High-Median 
distributions (Figure 10). The corresponding 
distributions of stimuli also resemble each 
other (i.e., Low-Midpoint-Median resembles 
Low Midpoint; High-Midpoint-Median is 
like High Midpoint; High Midpoint, Low 
Median is like Low Median; and Low Mid- 
point, High Median is like High Median). 
The nature of the relationships between the 
four mean judgment functions in Figures 15 
and 16 may be somewhat clarified by the 
a 


hypothetical functions presented for the same 
four conditions in Figures 17 and 18. These 
hypothetical functions were constructed fol- 
lowing the procedure used for Figure 8. First, 
tabulation was made of the frequencies with 
which the six categories of judgment were used 
for the Rectangular, column recording judg- 
ments of size. The empirical percentages of 
total category use, going from Very small to 
Very large, were as follows: 14.0, 21.9, 22.2, 
19.0, 16.8, and 6.0, somewhat more positively 
skewed than for the random recording, Rec- 
tangular size condition. The hypothetical 
mean judgments of each stimulus for each 
distribution were then computed, again as- 
suming that the categories would be used with 
the same frequencies and that the scales of 
judgment would be perfectly ordinal. 
Consider first the Low-Midpoint, High- 
Median and High-Midpoint, Low-Median 
conditions where the correspondence in form 


MEAN JUDGMENT 


HIGH MIDPOINT, 
LOW MEDIAN 


+ o—o 


LOW MIDPOINT, 4. 
HIGH MEDIAN 


1 2 3 4 5 6 7 8 9 
STIMULUS 


Fıc. 16. Mean judgments of size for the Low-Mid- 
point, High-Median, and the High-Midpoint, Low- 
Median distributions with column recording. 


ee 


Rance-Frequency COMPROMISE 29 


b 


HYPOTHETICAL MEAN JUDGMENT 
u 
a 


HIGH MIDPOINT- 

MEDIAN. ETSA 
LOW MIDPOINT- p----- 
MEDIAN. Brera 


6 7 8 9 


4. 5 
STIMULUS 


Fic. 17. Hypothetical mean judgments of size for 
the conditions shown in Figure 15, assuming that each 
category was used with the same frequency as for the 
Rectangular distribution of squares with column re- 
cording. 


between the hypothetical (Figure 18) and 
empirical (Figure 16) functions is particularly 
striking. In both figures, these two functions 
cross between Stimuli 3 and 4 and back again 
between Stimuli 6 and 7. The statistical sig- 
nificance of this intertwining was tested for 
the empirical functions by computing for each 
S a single score which represented the quad- 
ratic component of his mean judgment func- 
tion (following the same procedure as described 
above for the background-1,000 condition, but 
using seven coefficients for the second order 


—3,0, and +5, rather than the full nine since 
only the middle seven stimuli appeared in both 
distributions). The mean and standard devia- 
tion of these scores was 4.19 and 2.75, respec- 
tively, for the Low-Midpoint, High-Median 
condition; —3.08 and 2.82 for the High-Mid- 
point, Low-Median condition. The difference 


between these means is significant (t=11.25, 
df=71, p < .001), indicating that the func- 
tions in Figure 16 differ in their deviations 
from linearity in the general directions indi- 
cated by Figure 18. This correspondence be- 
tween what Ss actually did and what they 
would have done had they used their categories 
with fixed percentages is consistent with the 
significant difference in adaptation level asso- 
ciated with the differences in the medians of 
the two distributions of stimuli. The fact that 
the empirical functions are more parallel than 
the hypothetical functions follows from the 
assumption that Ss also tend to divide the 
range into fixed intervals. 

This compromise is clearest for the hypo- 
thetical Low-Midpoint-Median and High- 
Midpoint-Median functions. These also come 
together, part, and come together, and part 
again (Figure 17), but the differences are 


HYPOTHETICAL MEAN JUDGMENT 


2 HIGH MIDPOINT, 
LOW MEDIAN’ ° 


LOW MIDPOINT, conn 
HIGH MEDIAN © %77 


5 8 9 


4 
STIMULUS 


Fic. 18. Hypothetical mean judgments of size for 
the conditions shown in Figure 16, assuming that each 
category was used with the same frequency as for the 
Rectangular distribution of squares with column re- 
cording. 


6 7 


30 PaRDUCCI 


HYPOTHETICAL MEAN JUDGMENT 


HIGH MIDPOINT =———® 
A LOW MIDPOINT œ-----i D 


7 8 9 


4 5 6 
STIMULUS 


Fic. 19. Hypothetical mean judgments of size for 
the Low- and High-Midpoint conditions shown in 
Figure 5, assuming that each category was used with 
the same frequency as for the Rectangular distribu- 
tion of squares with random recording. 


much less marked so that the actual mean 
judgment functions would be more nearly 
parallel even if Ss used their categories with 
the exact frequencies used for the Rectangular 
condition. However, there is no meeting and 
parting of the empirical functions in Figure 
15, only the parallelism characteristic of the 
Low- and High-Midpoint functions (e.g., 
Figure 5). Thus, in spite of the failure to find 
a significant difference in adaptation level as- 
sociated with independent variation of the 
midpoint for these four distributions, the 
mean judgment curves show consistent differ- 
ences which are not associated with the hypo- 
thetical tendency to use the categories of 
judgment with fixed relative frequencies. 

This argument for the existence of a pro- 
portionate-subrange tendency may also be 
applied to the mean judgments of the original 
midpoint conditions presented in Figure 5. 


Hypothetical fixed-frequency curves, based on 
the actual frequencies of category usage for 
the random recorded, Rectangular square con- 
dition, were constructed following the same 
procedure for the original Low-Midpoint and 
High-Midpoint distributions. These hypo- 
thetical functions are presented in Figure 19. 
Insofar as the differences between them are 
systematic, they are in the direction opposite 
to the differences between the functions in 
Figure 5 which represent the empirical judg- 
ments of these two distributions. Thus, the 
hypothetical proportionate-frequency tendency 
would appear to actually reduce those differ- 
ences between the Low- and High-Midpoint 
judgments which depend upon the differences 
between their end values, a consideration 
which adds weight to the hypothesized tend- 
ency to divide the range into proportionate 
subranges. 

The judgments for both the Low-Midpoint, 
High-Median and the High-Midpoint, Low- 


5 


SCALE OF EQUAL DISCRIMINABILITY 
ow 


HIGH MIDPOINT, 
LOW MEDIAN 


o—o 


LOW MIDPOINT, w 
HIGH MEDIAN” "~~~ 


Fic. 20. Equal-discriminability scaling of judgments 
of size for the two conditions shown in Figures 16 
and 18. 


RancE-Frequency COMPROMISE 31 


Median conditions were also scaled using the 
method of successive intervals as described in 
Torgerson (1958, pp. 236-240). This pro- 
cedure measures the “discriminability” of the 
stimuli; but since the zero point of each scale 
is arbitrarily fixed, interest in the two discrim- 
inability scales (Figure 20) should be directed 
to the relative differences in scale values for 
the respective stimuli. This interaction (be- 
tween the effects of distribution and stimulus 
value) appears to be almost as great as for the 
corresponding mean judgment functions (Fig- 
ure 16), and the forms of these intertwining 
functions are quite similar, Thus the varia- 
tions of stimulus context which yield system- 
atic shifts in the mean judgments appear to 
have similar effects upon the discriminability 
of the stimuli as determined by the Thurstone 
scaling. The mean discrepancy between the 
obtained proportions and the best-fit propor- 
tions based upon the model was only .022, 
almost identical to the mean deviation found 
for the Rectangular distribution of squares. 
The four remaining distributions in Table 
1, Low Midpoint, High Median (LH), High 
Midpoint, Low Median (HL), Low Median 
(L), and High Median (H), were selected to 
further investigate the effects of variation in 
midpoint and median. The midpoint of LH 
distribution is two stimulus steps below the 
midpoint for HL distribution, twice as great 
as the prior difference in midpoint. The 
medians also vary for these new distributions 
(by slightly more than one stimulus step), the 
lower median going with the higher midpoint. 
These two distributions were presented for 
judgments of numerousness, with random re- 
cording, five categories, and six different orders 
of presentation for each distribution. The 


adaptation levels are presented in Table 7. 
In spite of the opposing (but relatively smal- 
ler) difference in median, adaptation level is 
significantly higher for HL condition eiea 
3.65, df=62, p < .001). 

The LH and HL distributions were also 
presented for magnitude estimation in terms 
of zero-centered scales, with zero as “average” 
and positive and negative numerals to indicate 
deviations from average. As indicated in Table 
7, the standard deviations of these adaptation 
levels (each adaptation level being the mean 
of the stimuli judged zero) are higher than the 
standard deviations for the corresponding five- 
category adaptation levels. In spite of the 
difference in response scales, the effects of the 
major difference in distribution, greater for 
the midpoint, are significant (t= 2.69, df= 58, 
p < 01). An analysis of variance, performed 
on all four LH and HL groups, indicates that 
the interaction between distribution and re- 
sponse scale is not significant (F < 1.0) but 
that the effects of response scale (F=5.67, 
df=1/120, p < .025) and distribution (F= 
18.85, df=1/120, p < .001) are both sig- 
nificant. 

The mean judgments for the zero-centered 
response scales (Figure 21) were computed 
following a modification of the procedure de- 
scribed above for the original Low- and High- 
Median conditions. For LH and HL condi- 
tions, the denominator for each ratio was the 
absolute value of that mean judgment by each 
S which deviated most from zero, These ex- 
treme judgments were applied to Stimulus 1 
by two thirds of LH Ss, the other third apply- 
ing their most extreme judgments to Stimulus 
7 (ie., the highest stimulus value in LH distri- 
bution). Approximately two thirds of HL Ss 


TABLE 7 


Variation IN MIDPOINT AND MEDIAN: ADAPTATION LEVELS FOR DIFFERENT SCALES OF NUMEROUSNESS 


Scale of judgment 


Distribution 


Five categories 


Centered at 0 Centered at 1,000 


N M 


SD N M SD N M SD 


Low Midpoint, High Median (LH) 32 4.41 
High Midpoint, Low Median (AL) 32 4.95 


69 30 466 1.03 sul 5.57 93 
AT 30 5.41 96 47 5.99 1.32 


32 Parpuccr 


MEAN JUDGMENT 


7100; 2 


5 
STIMULUS 


Fic. 21. Magnitude estimations of numerousness 
for LH and HL distributions using unrestricted scales, 
centered at zero. 


-3 
applied their most extreme judgments to Stim- 
ulus 9, the other third to Stimulus 3 (i.e., the 
lowest stimulus value in HL distribution). 
These differences in procedure, dictated by the 
asymmetry of the distributions of stimuli, 
did not result in markedly different correction 
factors. The mean denominator for each of 
the four zero-centered conditions (Low Me- 
dian, High Median, LH, and HL) was on the 
order of 36.0, with standard deviations also 
approximating 36.0. 

Mean judgments were computed directly 
for the five-category LH and HL conditions. 
These are presented in Figure 22 for compari- 
son with the zero-centered scales for the same 
distributions (Figure 21). The extraordinary 
correspondence between these two figures in- 
dicates that the effects of variation of the 
end values (i.e., midpoint) upon the general 
form and position of the scale of judgment is 
independent of even such an extreme modifi- 
cation in the categories of judgment (five-cate- 
gory, verbally anchored versus unrestricted- 


numerical, anchored only at zero). It should 
also be noted that both pairs of functions 
(Figures 21 and 22) resemble in form the 
functions for the original Low- and High- 
Midpoint conditions (e.g., for the six-cate- 
gory square conditions shown in Figure 5) 
which utilized similar, but somewhat less reg- 
ular, distributions of stimuli. This invariance 
of form for such different response scales would 
appear to rule out any interpretation of the 
midpoint effect which stressed specific anchor- 
ing of the end categories at the end values of 
the distribution of stimuli. Since the end 
categories are completely unspecified in the 
instructions for use of the zero-centered scales 
of judgment, Ss were free to judge one extreme 
to be much closer in value to the average 
(zero) stimulus than they judged the other 
extreme. The fact that the differences between 
these extreme judgments and zero correspond 
so closely to the differences between the cate- 
gory judgments of these same stimuli and 
average (3) for the five-category conditions 
(relative to the total range of response values) 
suggests that any anchoring implicit in the 
instructions for the category judgments is not 
specific to that type of judgment scale. 

The LH and HL distributions of dots were 
also used for magnitude estimations with Ss 
instructed to judge the average stimulus 
“1,000.” Again six subgroups with different 
stimulus sequences were used for each condi- 
tion. Although the difference in adaptation 
levels (Table 7) is again in the expected direc- 
tion, both adaptation levels are much higher 
than the corresponding adaptation levels for 
the scales centered at 0. The 1,000-cen- 
tered instructions were also used following the 
same procedure for LH and HL distributions 
of squares (again with six subsequences and 
approximately 50 Ss for each condition). The 
respective adaptation levels and standard 
deviations are 5.86 and .94 for LH, 5.99 and 
1.49 for HL. These four magnitude-estimation 
conditions were combined in a 2x2 analysis 
of variance. Only the mean square associated 
with distribution is statistically significant 
(F=4.67, df=1/187, p < .05). It should be 
noted that the F ratio here is much smaller 
than the corresponding ratios obtained for 
other midpoint variations. However, the 
median judgment functions for the dots (Fig- 


RaNGE-FREQUENCY COMPROMISE 33 


Sie 


zs 
oT, 


w 


MEAN JUDGMENT 


N 
> 


l] 2 3 4 5 6 T 8 9 


STIMULUS 
Fic, 22. Mean judgments of numerousness for LH 


and HL distributions using five categories with ran- 
dom recording. 


ure 23) are very similar to those obtained 
using either the 0-centered scale (Figure 21) 
or the five-category scale (Figure 22), the 
only marked difference being a relative flatten- 
ing of the upper end of the five-category scale 
for HL distributions. The medians of Ss’ mean 
judgments were plotted in Figure 23 (instead 
of the means as in each of the previous figures) 
because of the extreme skewness of the distri- 
butions of mean judgments, particularly for 
Stimuli 6, 7, 8, and 9. Higher mean judgments 
are also found for LH than for HL, but the 
absolute differences in judgment are smaller 
for the means than for the medians. Stevens 
(1956) plots the medians in order to minimize 
the effects of skewing. The within-group vari- 
ations in these magnitude estimations are 
rather astonishing, the judgments of Stimulus 
9 varying all the way from 1,000 to 6,650 for 
HL distribution of dots. Another procedure 
required rescaling of each S’s mean judgments 
so as to eliminate the individual differences 
in the range of numerals applied to the stimuli 
(the procedure used for Q-centered scales 
in Figures 9 and 21). This results in scales 
which closely resemble those in Figure 23 but 


with the relative differences in judgment some- 
what increased. Median judgments were also 
computed for the 1,000-centered judgments of 
LH and HL distributions of squares. The re- 
sulting scales were similar in form to those for 
dots (Figure 23); but the slopes were flatter 
for the squares, and the smaller differences 
between corresponding median judgments were 
not systematic for the judgments of size. It 
thus appears that although significant mid- 
point effects are also obtained under the 1,000- 
centered instructions, these effects are less 
dramatic than with the category or 0-cen- 
tered scales—perhaps because of the great 


inter-Ss variability with the larger numerals. 


The median magnitude estimations (1,000 
centered) were also plotted against the physi- 
cal values of the stimuli using logarithmic 
coordinates. Contrary to the power-function 
hypothesis of Stevens and Galanter (1957), 
the log-log function was actually less linear 
(somewhat negatively accelerated) for the dots 
and no more linear for the squares than the 


6 7 8 9 


4 5 
STIMULUS 


Fic. 23. Magnitude estimations of numerousness for 
LH and HL distributions using unrestricted scales, 
centered at 1,000. 


34 Parpuccr 


corresponding plots of median judgment 
against stimulus number (as in Figure 23). 
The curvilinearity of the log-log plots for the 
dots could however be eliminated by breaking 
the functions into two separate segments, each 
of which would be fairly linear (as was sug- 
gested for Figure 12). A plot (not shown) of 
the category judgments (Figure 22) against 
the 1,000-centered judgments (Figure 23) 
confirmed that the relationship between these 
scales is linear (except for the judgment of 
Stimulus 9 under HL condition); however, 
the slope is steeper for LH than for HL dis- 
tribution. 

The two final distributions, L and H, are 
almost identical, respectively, to HL and LH 
distributions. As shown in Table 1, the only 
difference is that two of the presentations of 
both Stimuli 3 and 7 have been eliminated, 
with a corresponding addition of one presenta- 
tion each for Stimuli 1, 2, 8, and 9. The pur- 
pose of these modifications of HL and LH 
distributions was to provide a basis for evalu- 
ating the effects of minimal presentation of 
the extreme stimuli. Thus, H and L distribu- 
tions have identical end points; however, the 
added values occur so infrequently that the 
usual midpoint effects should be reduced in 
magnitude. The respective medians remain 
the same as for LH and HL distributions, and 
thus any shifts in adaptation level would re- 
flect the influence of occasional presentations 
of end stimuli. Two procedures were used for 
presenting both H and L dstributions of dot 
patterns. For 12 subgroups of Ss judging these 
distributions (each subgroup getting a dif- 
ferent order of presentation), judgments were 
made following the usual 45-presentation, un- 
judged preview of the patterns; but 12 addi- 
tional subgroups were not given the preview 
before the regular instructions for judgment. 
The purpose of this additional variation (pre- 
view versus no preview) was to determine 
whether the effects of infrequently presented 
end values are greater when Ss have been ex- 
posed to them before overt establishment of 
their scales of judgment. 

The resulting adaptation levels are quite 
close (i.e., no marked difference between the 
corresponding preview and no-preview condi- 
tions), and subsequent analysis for H and L 
distributions was based on the combined adap- 


tation levels obtained using both methods of 
presentation. For the 69 Ss judging L distri- 
bution of dots, the mean adaptation level is 
4.64, with a standard deviation equal to .57; 
for the 69 Ss judging H distribution, the mean 
is virtually identical, 4.62, with a standard de- 
viation equal to .70. The analysis of variance 
comparing adaptation levels for L, H, LH, and 
HL distributions of dot patterns indicates that 
the additional end values significantly reduce 
the difference in adaptation level associated 
with midpoint (for interaction, F=7.37, 
df = 1/198, p < .01). The finding that 
there is no difference between the adapta- 
tion levels for H and L distributions suggests 
that although the addition of the extreme stim- 
uli reduces the effective difference in mid- 
points, it does not completely eliminate it. 
It appears that in spite of the loss in effective 
value of the extreme stimuli due to the infre- 
quency of their presentation, sufficient influ- 
ence remains to counteract the expected effect 
of the difference between the medians of the 
H and L distributions. 

Summary, These results provide further 
support for the conclusions drawn from the 
first three sections. The characterization of 
the stimulus distributions with respect to meas- 
ures of central tendency is more complex in 
this section since here both the midpoint and 
the median take different values than the 
mean. However, these distributions are actu- 
ally simpler, more regular, and perhaps more 
representative, especially when midpoint and 
median differ in opposite directions from the 
mean. Since all of the distributions compared 
in this section had the same mean, differences 
in adaptation level are not consistent with the 
theory of adaptation level. Strong evidence is 
again obtained that adaptation level varies 
directly with independent variation of either 
the midpoint or median. 

This further support for the range-frequency 
compromise is more clearly demonstrated 
through analysis of the scales of judgment. 
The hypothetical mean judgment functions, 
constructed to show how the stimuli would 
have been judged if Ss had used their cate- 
gories with the same frequencies as they were 
used when the stimuli were presented with 
equal frequencies, again exaggerate the inter- 
twining of the obtained scales of judgment. 


RANGE-FREQUENCY COMPROMISE 35 


When the latter are linear, as with the orig- 
inal Low- and High-Midpoint distributions, 
the differences between the proportionate- 
frequency functions tend to be in the opposite 
direction from the obtained differences in judg- 
ment. This is consistent with the range-fre- 
quency compromise since the tendency to 
subdivide the range into proportionate sub- 
ranges would produce differences in the ob- 
tained direction. The implication is that the 
obtained differences would have been even 
greater were it not for the tendency to use the 
respective categories of judgment with fixed 
relative frequencies. Rescaling of the judg- 
ments in accordance with the Thurstone 
model (i.e., using the method of successive 
intervals) has little effect upon the forms of 
the judgment functions so that the intertwin- 
ing predicted from the range-frequency com- 
promise is also obtained when the stimuli are 
scaled in terms of their discriminability. 

Evidence is presented for the generality of 
the midpoint effects over a variety of re- 
sponse scales. The mean judgment functions 
are quite similar whether Ss are restricted to 
five, verbally anchored categories, whether 
they use positive and negative numbers to 
indicate how much more or how much less 
numerous than average the respective dot 
patterns appear, or whether magnitude-esti- 
mation scales are anchored to the number of 
dots in the average pattern (1,000-centered 
scales), This indicates that the range-fre- 
quency compromise, and particularly the mid- 
point effect, does not depend upon implied 
anchoring of the end categories. Both cate- 
gory and magnitude-estimation scales reflect 
the hypothesized division of the range of 
stimuli so that fixed proportions of the scale 
of judgment are applied to fixed proportions 
of the range of stimuli. 

The results of this section also suggest that 
the effectiveness of the stimulus extremes as 
anchors for the two ends of the scale of judg- 
ment is reduced when the end stimuli are not 
presented with sufficient frequency. However, 
endpoint effects could be inferred from the data 
even when one of the end stimuli was pre- 
sented only once in each block of 45 presenta- 
tions, The role of the stimulus extremes in the 
determination of judgment thus appears to be 


out of all proportion to their actual frequency 
of presentation. 


Discussion 
Adaptation Level Theory 


Midpoint. The effects of independent vari- 
ation of the two end values of the distribution 
of stimuli (or of their mean, the midpoint) 
are the most clearly demonstrated of the dis- 
tribution effects studied in this research. With 
mean and range held constant, and with 
median also constant or else varying in the 
opposite direction, adaptation level shifts di- 
rectly with midpoint for the Low- and High- 
Midpoint distributions of successively pre- 
sented dots, squares, and weights, for the same 
distributions of simultaneously presented dots, 
and for LH and HL distributions of succes- 
sively presented dots (with five categories of 
judgment and also with unrestricted scales, 
centered at either O or 1,000). In each case, 
these shifts in adaptation level are accom- 
panied by corresponding shifts over the entire 
scale of judgment, the function relating judg- 
ments and stimuli being linear with equal 
slopes for each matched pair of Low- and 
High-Midpoint conditions. The only failure 
to obtain a significant shift in adaptation level 
with shift in midpoint appeared in the fac- 
torial analysis of the four median-midpoint 
combinations for judgments of size (Low Mid- 
point-Median; Low Midpoint, High Median; 
etc.). However, further analysis of the scales 
of judgment indicated that if the propor- 
tionate-frequency hypothesis is accepted, pro- 
portionate-width effects are obtained even for 
these distributions. 

These midpoint effects provide more gen- 
eral support for the conclusions of the re- 
search on the effects of independent variation 
of the midpoints of distributions of simul- 
taneously presented numerals, (Parducci et al., 
1960), successively presented numerals (Par- 
ducci & Marshall, 1961a), and simultaneously 
presented lines (Parducci & Marshall, 1961b). 
The dimension of numerical magnitude may 
be peculiarly abstract, judgments on this di- 
mension reflecting verbal habits not used with 
the psychophysical dimensions which have 
provided the empirical foundations for the de- 
velopment of adaptation level theory; and it 


36 5 Parpuccr 


may be particularly easy to remember the end 
values when these are presented in numerical 
form. It is also possible that the method of 
simultaneous presentation, used for most of 
the previous research on the midpoint, accen- 
tuates the extreme stimuli. Since spatial posi- 
tions of the simultaneously presented stimuli 
were ordered with respect to stimulus magni- 
tude, the end values were more prominently 
accessible as context against which the other 
stimuli could be compared. But since the 
same midpoint effects are demonstrated here 
with psychophysical dimensions and with suc- 
cessive and random presentation, it would ap- 
pear that there is now sufficient evidence to 
warrant incorporating the midpoint effects 
into the theory of adaptation level. 

Although previous adaptation level equa- 
tions (Helson, 1959; Johnson, 1955) have 
given equal weighting to each stimulus in the 
series presented for judgment (except insofar 
as the attention of S is experimentally directed 
to specific values), the basic approach would 
not be altered by giving increased weighting to 
the two end values. It is here proposed that 
the end values of the distribution of series 
stimuli be treated in the adaptation level ap- 
proach either as though they had been pre- 
sented with greater frequency than the other 
stimulus values or as though £ had specifically 
anchored the scale of judgment to these two 
end values (e.g., by instructing S to apply his 
middle category of judgment to stimuli which 
appear halfway between these two stimuli). 
This proposed change in emphasis would in- 
corporate Volkmann’s (1951) approach to 
absolute judgment into the more general 
theory of adaptation level. An obvious direc- 
tion for development of the theory is toward a 
rationale for weighting different features of 
the stimulus situation, The present research 
on the effects of independent variation of mid- 
point represents a step in that direction 
by providing an empirical basis for greater 
weighting of the stimulus extremes. 

Median. Independent variation of the 
median of the distribution of stimuli produces 
significant variation in adaptation level for the 
Low- and High-Median distributions of suc- 
cessively presented dots, squares, and weights, 
and also for both simultaneous and successive 
presentations of the Low- and High-Median 


(dots) distributions. This direct variation 
of adaptation level with the median is ob- 
tained using different numbers of categories of 
judgment (4, 5, or 6) and also using the 
unrestricted numerical scales, centered at 0, 
or with 1,000 anchored to the background. 
The factorial analysis for the four midpoint- 
median combinations (Low Midpoint-Me- 
dian; Low Midpoint, High Median; etc.) 
also indicated that the effects of the median 
were statistically significant. 

As with the variation in midpoint, the 
present median effects are consistent with the 
previous conclusions based on judgments of 
numerical magnitude and of visual length. 
The previous analysis of the judgments of the 
simultaneously presented lines (Parducci & 
Marshall, 1961b) had indicated that adapta- 
tion level is sometimes a poor index to the 
scale of judgment since two scales of judgment 
might be quite similar with respect to adapta- 
tion level and end values and yet differ mark- 
edly at intermediate values. The tabulation 
and statistical analysis of the scales of judg- 
ment for the present Low- and High-Median 
distributions and also for the Midpoint-Me- 
dian distributions reveal that these scales ac- 
tually intertwine, the differences in judgment 
being in one direction for the middle region of 
the scale but in the opposite direction at both 
outer regions. 

The effects of variation in median present a 
more difficult challenge to the theory of adap- 
tation level than do the midpoint effects, for 
here it is not simply a matter of weighting cer- 
tain stimuli more heavily.* Even if some ap- 
propriate weighting could be found to describe 
adaptation level, the judgments of the respec- 
tive stimuli are clearly not represented by a 
linear function of the ratio of each stimulus to 
adaptation level—as asserted by the adapta- 
tion level equation for judgment (Helson, 
1959, p. 586). 

Mean. Independent variation of the mean 


8 A. Ahumada has recently suggested to the author 
that the median effects might be incorporated into 
Helson’s general equation by weighting each pre- 
sentation in inverse proportion to its stimulus dis- 
tance from the midpoint. A similar procedure for 
weighting would have to be applied around each of 
the other proportionate-width limens in order to 
describe the rest of the scale of judgment. 


Rance-Frequency COMPROMISE H 37 


of the distribution of successively presented 
squares, dots, and weights, and of simultane- 
ously presented dots, does not result in marked 
variation in adaptation level. The difference 
in adaptation level is significant for only one 
of the comparisons, Low- versus High-Mean 
(gap); and then the difference in adaptation 
level is in the opposite direction from the dif- 
ference between the means. Considered along 
with the previously reported research (Par- 
ducci et al., 1960; Parducci & Marshall, 
1961a, 1961b), these results appear inconsist- 
ent with the basic adaptation level equation 
in which adaptation level varies directly with 
the logarithmic or geometric mean of the 
series stimuli (Helson, 1959, pp. 579-582). 
Since the present stimulus values are quasi- 
logarithmically spaced, the relationships be- 
tween the arithmetic means of the distribu- 
tions when the stimuli are scaled 1-9 reflect 
the corresponding relationships between the 
geometric means of the stimulus distributions 
when the physical values of the stimuli are 
used’ (both sets of means are presented in 
Table 1), The failure to demonstrate any 
systematic relationship between these means 
and the adaptation levels when midpoint and 
median are held constant should raise doubts 
concerning the general statement of adapta- 
tion level theory as an averaging process or 
pooling of all stimulus values. Sizable portions 
of the stimulus distribution can be shifted up 
or down with no effect upon adaptation level, 
unless either the median or midpoint is also 
shifted. 

Although these negative results cannot dis- 
prove this pooling feature of the theory of 
adaptation level, they indicate the appropri- 
ateness of further scrutinizing the evidence for 
the pooling or averaging process. For each 
demonstration that adaptation level has varied 
with the mean, it should be determined 
whether the variation could not also be at- 
tributed to shifts in either the midpoint, the 
median, or in both the midpoint and median. 
These three alternative measures of central 
tendency of the distribution of stimuli usually 
vary together unless special concern has been 
taken in the construction of the distributions 
of stimuli, And since the mean is intermediate 
between the midpoint and median in most 
natural distributions, its previous success as a 


predictor of adaptation level may be entirely 
due to its providing a rough approximation to 
some average of the other two parameters. 

The scales of judgment for the Low- and 
High-Mean conditions (Figure 13) further 
illustrate the problem which nonlinear re- 
sponse scales raise for the theory of adapta- 
tion level. These two conditions yield vir- 
tually identical adaptation levels, and yet the 
mean judgment functions differ markedly in 
the directions described by the hypothetical 
proportionate-frequency functions. Whenever 
such functions are different in form, neither 
adaptation level nor any other single response 
measure can provide a very useful index to the 
differences between the scales of judgment. 
Since the theory of adaptation level organizes 
the entire scale of judgment around adaptation 
level with no weighting for differences in stim- 
ulus frequencies except insofar as these affect 
adaptation level (Helson, 1959, p. 585), the 
present data indicate that this basis for char- 
acterizing the stimulus judgment relationships 
should be modified. 


Range-Frequency Compromise 


Proportionate Subranges. The evidence 
presented for the usefulness of the midpoint 
as a predictor of adaptation level is consistent 
with the assumption that Ss tend to divide the 
range so that the categories of judgment are 
applied to proportionate subranges, measured 
on a scale of equal discriminability. For each 
of the present stimulus dimensions, the mean 
judgments are linearly related to the discrim- 
inability scale values when the successive 
stimuli are somewhat equally discriminable 
and are presented with equal frequency (ie, 
for the Rectangular distributions, Figures 2, 3, 
and 4). The middle categories are used with 
approximately equal frequency and cover ap- 
proximately the same psychological differences 
in terms of the discriminability criteria. The 
widths of the two end categories are indeter- 
minate (no lower limen for the lowest cate- 
gory, no upper limen for the highest cate- 
gory), and they are used much less 
frequently. The simplest hypothesis consistent 
with the data is that the discriminability of 
the difference separating the lower and upper 
limens is the same for all categories. The ac- 


38 PARDUCCI 


tual physical ratios of the upper to the lower 
limens remain fairly constant over the differ- 
ent categories for weight and numerousness 
but tend to decrease for the upper size cate- 
gories. 

Insofar as Ss do in fact apply their cate- 
gories to subranges spanning equal stimulus 
ratios, judgments should be logarithmically 
related to stimulus values. This is the rela- 
tionship entailed by Helson’s law of judgment 
(Helson, 1959, p. 586), assuming that adap- 
tation level is not varying systematically with 
the value of the stimulus presented for judg- 
ment. It has considerable empirical support, 
both in adaptation level literature and in the 
work on psychophysical scaling (e.g., Garner, 
1958; Stevens & Galanter, 1957). It is also 
supported by the various functions relating 
mean judgment to stimulus number in the 
present report, all of which are roughly linear. 

Variations in the end values affect the slope 
but not the linearity of the mean judgment 
functions. Thus the slope tends to be greater 
when the range is smaller (compare the slopes 
for dots in Figure 2 and 22 and for size in 
Figure 2 and 5), but the slopes tend to be 
equal for stimuli covering similar logarithmic 
ranges and ranges which are similar with re- 
spect to discriminability (e.g., LH and HL 
conditions in Figure 21; also for Figures 22 
and 23). The mean judgment of any given 
stimulus varies inversely with the midpoint of 
the stimulus distribution, and the mean stim- 
ulus to which each category is applied varies 
directly with the midpoint. The fact that the 
empirical functions are so consistent, in spite 
of the unconfounding of midpoint and mean 
in this research, lends credence to the propor- 
tionate-subrange assumption. 

What restrictions must be placed upon the 
role attributed here to the two end values? 
Insofar as the adaptation level approach has 
been correct in weighting respective stimuli, 
other things being equal, in accordance with 
their frequency of presentation, the effect of 
variation in the end values (i.e., the mid- 
point) should increase with an increase in 
their relative frequency of presentation. This 
variable was not investigated in the present 
study, but some effort was made to assess the 
lower limit of presentation frequencies for 
these values. Stimuli 1 and 2 for L condition 


and Stimuli 8 and 9 for H condition were each 
presented only once in each block of 45 pres- 
entations. The resulting adaptation levels in- 
dicate that even such infrequent presentations 
have significant effects upon the scale of judg- 
ment, though the effects are not as marked as 
would be expected from more frequent pres- 
entations. It might be hypothesized that S 
quickly anchors his end categories, hesitating 
to reanchor them to more extreme values inso- 
far as these are presented only infrequently. 
However, the fact that adaptation level was 
not affected by an additional, unjudged pre- 
view, which included the rare end values, 
argues against this hypothesis. Perhaps it 
would be more reasonable to assume that there 
is some forgetting of infrequently presented 
end values. Since the two lowest patterns (8 
and 12 dots) are more accurately identified 
and presumably more easily remembered than 
the two highest patterns (95 and 122 dots— 
which were systematically underestimated, 
Figure 12), the adaptation levels for L and H 
conditions should be more similar to the adap- 
tation level for LH (which lacked Stimuli 8 
and 9) than to the adaptation level for HL 
(which lacked Stimuli 1 and 2). And this is 
the case. 

If memory for the end values is crucial to 
(or associated with) the fixation of the scale 
of judgment, midpoint effects should be less 
for lifted weights. The dimension of weight 
would appear to be more difficult to anchor 
than numerousness (where the number of dots 
can actually be counted for some patterns) or 
size (where each square is surrounded by a 
visual background of constant size). Al- 
though the interaction between the effects of 
midpoint and stimulus dimension was not 
statistically significant, the difference in adap- 
tation level associated with midpoint was 
only half as great for the weights. 

This hypothesized forgetting of end values 
may also be relevant to the interpretation of 
the puzzling direction-of-shift effect (Par- 
ducci, 1956a). The results of dropping out the 
four heaviest weights (described above in the 
postshift-judgment section) appear to reflect 
a much more dramatic readjustment of the 
scale of judgment than the results previously 
reported for a similar restriction of the range 
of sizes of squares. The direction-of-shift 


RaNGE-FREQUENCY COMPROMISE 39 


effect (ie., greater shift in judgment when 
the range is extended than when it is re- 
stricted) may thus be related to S’s memory 
for the end values of the preshift distribution. 
This presents no problem with well-anchored 
dimensions (e.g., size) where the major con- 
cern is with respect to the permanence of an 
easily discriminated shift (Parducci & Hohle, 
1957). But with lifted weights, the remem- 
bered values of PSEs for the omitted stimuli 
shift systematically toward the other values 
presented for judgment (Parducci & Marshall, 
1962). Thus the direction-of-shift effect may 
be another manifestation of the crucial role 
of the end values in the determination of the 
scale of judgment. Extension of the range 
immediately shifts the end values, and adap- 
tation level shows a corresponding shift long 
before the mean of the stimuli has shifted 
markedly toward its asymptotic postshift 
value. Restriction of range may shift adapta- 
tion level only insofar as the PSEs for the 
preshift end values shift systematically toward 
the postshift end values. Presumably, there 
must also be some kind of weighting for in- 
formation about the permanence of the shift 
and also for recency. End values which are no 
longer presented, regardless of how well they 
are remembered, must eventually cease to 
anchor the end points of the scale of judg- 
ment, 

The results of the midpoint variations us- 
ing unrestricted numerical scales, centered at 
either 0 or 1,000, suggest that the proportion- 
ate-subrange tendency operates even when S 
is free to select his own end categories and to 
further subdivide the range. Almost three 
times as many different responses are used, 
on the average, by Ss free to select their cate- 
gories (as by Ss restricted to five categories). 
Furthermore, the O-centered and 1,000-cen- 
tered judgments of the respective end stimuli 
differ markedly for distributions with dif- 
ferent midpoints. As shown in both Figures 
21 and 23, Stimulus 9 is judged greater than 
Stimulus 7, even though both are the upper- 
most stimulus values for their respective dis- 
tributions and both are judged Very many 
on the five-category scale (Figure 22). But 
as with the original midpoint distributions 
(Figure 5), this trend toward constancy is 
not carried to the point where a given stimu- 


lus elicits the same judgment for both Low- 
and High-Midpoint distributions. It thus 
appears that S freely selects categories (nu- 
merals) so that differences in judgment re- 
flect the discriminability of the corresponding 
physical values, the center of the response 
scale tending toward the geometric midpoint 
of the end stimuli. With complete freedom 
to assign different judgments to the end 
stimulus, the proportionate-subrange tendency 
need not produce a midpoint effect. The 
fact that it does indicates that the center of 
the response scale is anchored to the mid- 
point by S himself. This finding seems con- 
sistent with the emphasis placed upon the 
center of the response scale, or upon adapta- 
tion level, by the theory of adaptation level. 

Proportionate Frequencies, The predictions 
from the proportionate-frequency assumption 
provide for a much greater variety of func- 
tions relating judgment to stimulus since local 
shifts in slope would be produced by almost 
any manipulation of the distribution of stimuli 
—and not just shifts in overall slope as pro- 
duced by manipulation of the end values. The 
customary index to the scale of judgment, 
adaptation level, would be easily described 
for a two-category scale if both categories were 
used with the same frequency and Ss kept a 
perfectly ordinal scale; for adaptation level 
would then be the median stimulus value. 
With a six-category scale, adaptation level 
would be the median only if the top three 
categories were used with the same frequency 
as the bottom three. This balance of fre- 
quencies around the middle limen is obtained, 
at least approximately, for the Rectangular 
distribution (for which the equal-subrange 
tendency would also yield equal frequencies). 

Although adaptation level varies in the ex- 
pected direction when the median is varied, 
the proportionate-frequency tendency was 
more impressively demonstrated through the 
analysis of the nonlinearity (quadratic com- 
ponent) of the mean judgment functions. 
Thus it was shown that, in each case, the hypo- 
thetical fixed-frequency function exaggerates 
the obtained deviations from linearity (e.g. 
Figures 8 and 17). Since an equal-subrange 
tendency would, in itself, produce linear mean 
judgment functions, the deviations in the 
direction of the hypothetical proportionate- 


40 Parpuccr 


frequency functions (i.e., toward what the 
mean judgment functions would have been 
had Ss used each category with the same fre- 
quency it had been used for the corresponding 
Rectangular condition) add convincing sup- 
port to the assumption that Ss tend to use 
their categories with fixed relative frequencies. 

Further support and also some basis for 
further specification of the proportionate- 
frequency tendency was obtained with the 
0-centered and background-1,000 scales. Here, 
there were no set categories to use with 
fixed frequencies, and of course there were 
large individual differences in the range of 
numerals actually used as judgments. The 
standard deviation of the correction factors 
for range was approximately equal to half 
the mean range of judgments for the 0-cen- 
tered conditions. Nevertheless, the adapta- 
tion levels and also the corrected mean 
judgment functions show the characteristic 
differences found when Ss were restricted to 
just six categories of judgment. In terms of 
the proportionate-frequency hypothesis, this 
suggests that S may also tend to use numer- 
als from different portions of his total range 
of responses with fixed relative frequencies. 
The crucial proportionate-frequency habit 
may thus fix the use of different proportion- 
ate subranges of the scale of judgment rather 
than the use of specific verbal responses, If 
the verbal anchoring of the response cate- 
gories related them to unequal portions of 
the range of judgment (e.g., small, average, 
slightly larger than average, large, very large), 
Ss might tend to use the categories with 
markedly disparate frequencies (i.e., Ss re- 
stricted to only one category below the center 
of the scale might use that category just as 
frequently as they use a greater number of 
permissible categories above the center of 
their scale of judgment). Furthermore, if 
the categories above the center were unevenly 
anchored (e.g., medium, large, very large, 
very very large, very very very large—or 5, 
7,9, 10, 11), Ss might tend to use one cate- 
gory just as often as several other categories 
combined (e.g., 7 just as often as 9, 10, 
and 11). The evidence from the 0-centered 
scales is only suggestive (Figures 9 and 21), 
but both sets of functions tend to be flatter 
for portions of the stimulus range where the 


t 


patterns are presented relatively infrequently; 
and this tendency toward curvilinearity is 
quite dramatic for the background-1,000 con- 
ditions (Figure 11). 

Method of Recording. For most of the 
distributions of stimuli presented in this re- 
search, different subgroups of Ss recorded 
their judgments either in random order (i.e., 
consecutively down the response sheet, in the 
order of presentation) or in columns headed 


_by the appropriate category labels. With 


random recording, S wrote down one of five 
or six category numerals (i.e., 1-5 or 1-6); 
with column recording, S recorded the num- 
ber of the presentation, 1-90, announced by 
E with each presentation, The method of 
recording thus constituted a major independ- 
ent variable, with primary interest in the 
interaction between its effects and the effects 
of variation in one or another of the param- 
eters of the distribution of stimuli. The 
purpose of studying this interaction was to 
further evaluate the assumption that S tends 
to use the different categories of judgment 
with proportionate frequencies. Column re- 
cording provides each S with a continuous 
record of the cumulated frequencies of his use 
of each of the categories. He need only com- 
pare the length of the columns of presenta- 
tion numbers across his response sheet to ob- 
tain an easily assessed and perfectly accurate 
record of these response proportions. Insofar 
as S divides the range into equal subranges, 
his category columns will be quite unequal 
in length (i.e., use) for any of the Low- or 
High-Median distributions. The experimental 
hypothesis was that S would compensate for 
these inequalities through increased use of 
the shorter category columns. It was assumed 
that this balancing would be less marked for 
the random recording conditions since it 
would be harder for S to assess the frequen- 
cies of categories scattered randomly about 
the page. Thus the hypothesis was that the 
effects of independent variation in the median 
stimulus value would be greater for column 
recording than for random recording. Follow- 
ing the same line of reasoning, the median 
effects should be least for the E recording 
conditions (used only with lifted weights) ; 
for with E recording, S has no record at all of 
his own responses, 


Rance-Frequency COMPROMISE 41 


The results provide limited support for 
these hypotheses. The interaction between 
distribution and method of recording was not 
statistically significant in any of the analy- 
ses of variance with which the hypothesized 
effects of recording could be tested. How- 
ever, the directions of the obtained interac- 
tions were consistent with the expected effects. 
Thus, with variation in midpoint (where the 
proportionate-frequency tendency would re- 
duce the difference in adaptation level), adap- 
tation levels are twice as disparate for the 
random recording as for the column recording 
(Table 2). With variation in median (where 
the proportionate-frequency tendency would 
increase the difference in adaptation level), 
the difference in adaptation levels for the 
dots is half again as great for column record- 
ing as for random recording (Table 3). For 
lifted weights, the difference is 60% greater 
for column recording than for Æ recording. 
However, there is no interaction at all for 
the low- and high-median size conditions 
(except perhaps in Table 6 where the inter- 
action is in the expected direction in spite of 
having to work against an hypothetical inter- 
action with the difference in midpoint). It 
was pointed out in the Results section that 
the mean stimulus functions are more linear 
for both the Low- and High-Median condi- 
tions with random recording than with column 
recording. This was true for all three stimu- 
lus dimensions, and it is consistent with the 
hypothesized recording effects since the de- 
viations from linearity were in the direction 
of the hypothetical proportionate-frequency 
functions. 

Certain factors may work against these 
effects of recording. One possibility is that 
since the random recording S knows the total 
number of presentations (his response sheet 
is numbered 1-90), he might feel more pres- 
sure toward using his less frequent categories 
—particularly in the later portions of the 
series. In addition to having no information 
about the total number of presentations, many 
of the size-judging, category-column Ss could 
infer from the absence of responses in the 
Very large column on their response sheet 
that they had never placed the largest square 
in that column (assuming that there was 
considerable absolute identification of the 


largest square). Insofar as there is a general 
reluctance to shift categories for a specific 
stimulus (Parducci, 1959), the column re- 
cording S would be less inclined to increase 
his use of the missing or infrequently used 
categories. 

To test these afterthoughts, new column 
recording groups were exposed to the High- 
Median distribution of squares using instruc- 
tions modified to include the statement that 
the same 45 presentations would be repeated 
three times (the first 45 were preview). The 
instructions also stated, after the second set 
of 45, that S should feel free to change his 
judgment of specific stimuli. Comparison of 
the judgments under these instructions with 
those for the usual column recording revealed 
only a slight shift toward more equalized 
use of the different categories, providing little 
evidence for this line of reasoning. 

A comparison was also made of the fre- 
quencies of use of the categories for the first 
versus second 45 judgments of the Rectan- 
gular distribution of squares under each of 
the following three methods of recording: 
random, column with preview, and column 
without preview. Little shift was found in 
the scale of judgment in the course of the 
presentation series. In each case, the number 
of Very large judgments was actually slightly 
less for the second 45 judgments. 

There remains the possibility that column 
recording Ss are no more sensitive than ran- 
dom recording Ss to inequalities in the fre- 
quencies with which they have used their 
categories. Although recording in columns 
should provide a more accurate impression of 
these frequencies, some of the random record- 
ing Ss may have formed similar or perhaps 
even exaggerated impressions of the actual 
inequalities in category usage. Thus, the 
distributions of judgment were actually more 
symmetrical for the random recording than 
for the column recording, Rectangular con- 
ditions. Whatever the basis for this unex- 
pected difference, it would tend to reduce the 
expected interactions between the effects of 
distribution and recording, 

The method of simultaneous presentation 
involves the most extreme capitalization upon 
the possibilities of providing Ss information 
about the frequencies with which they are 


42 Parpuccr 


using the categories. The previous use of this 
method for judgments of visual length (Par- 
ducci & Marshall, 1961b) had demonstrated 
significantly greater median effects when 
stimuli were spaced on the page so as to 
emphasize the frequencies with which differ- 
ent categories were used by S. The same 
spacing principle was used here for judgments 
of numerousness, and highly significant me- 
dian effects were again obtained. i 


Nature of the Compromise. The propor- 
tionate-subrange and proportionate-frequency 
hypotheses have been presented here simply 
as empirical generalizations, closely related to 
both the data and common observation. Some 
Ss can even be coaxed into volunteering ver- 
bal characterizations of one or the other of 
these two tendencies of judgment. But why 
should Ss show these tendencies, and why 
should the scales of judgment represent a 
compromise between them? 

A functional interpretation can be made in 
terms of the role of judgment in the psy- 
chological description of the environment. 
How important for S are the physical differ- 
ences between stimuli? For simple perceptual 
dimensions, S answers this question by de- 
scribing how difficult it is for him to discrimi- 
nate between the stimuli. Insofar as the 
stimuli are discriminable, he puts them into 
different categories. Thus, the categories have 
lower limens when the stimuli are judged in 
order of increasing magnitude; and even if S 
knows the physical values of the stimuli he 
will subsequently be judging, he tends to run 
out of categories before he gets through the 
series (Parducci, 1959). Beyond this sim- 
plest evidencing of discrimination, S uses his 
categories to mark off equally discriminable 
differences on the stimulus dimension. Thus, 
the mean judgments are a linear function of 
the values of the stimuli on a Thurstone 
scale of equal discriminability (Figure 4). 
Physical description, the sort of activity that 
is accomplished with a yardstick, is clearly 
not the function of unanchored category scal- 
ing (of the kind studied in this research) ; 
for Ss are capable of much more accurate 
descriptions of the actual physical relation- 
-ships between the stimuli (as shown in Fig- 
ure 12). The hypothesized proportionate- 


subrange tendency serves to describe the 
discriminability of the stimuli for the ob- 
server making the judgment. 

Another function of judgment is to identify 
which of the alternative stimuli is being pre- 
sented on a particular occasion. This is the 
function which has been stressed in the in- 
formational analyses of absolute judgment 
(e.g, Garner & Hake, 1951). The hypo- 
thesized proportionate-frequency tendency 
serves to modify the characterization of the 
discriminability of the stimuli in the direction 
of more efficient identification. When the num- 
ber of different stimulus values exceeds the 
number of categories of judgment, more effi- 
cient identification can generally be achieved 
by using the available categories with equal 
frequencies, As an example, assume that the 
middle weight (5) of the present experiment 
had been presented four times as often as 
each of the other eight weights. Assume also 
that S’s judgments of this distribution were 
restricted to just three categories, Since each 
category could be restricted to just three 
different weights, the judgments could in- 
crease the probability of correct identification 
of each weight (i.e., the reciprocal of the 
number of the different stimulus values iden- 
tified by the same category) from one ninth 
(pure chance) to one third. If S instead re- 
stricted his middle category to presentations 
of the middle weight, he could unequivocally 
identify the stimulus on one third of the pres- 
entations. Although there would be some in- 
crease in the ambiguity of the other two 
categories, each including four instead of 
only three weights, the overall increase in 
probability of correct identification would be 
from one ninth to one half, i.e., 

HA) +) +A). 
The net increase in identification might ap- 
pear to be rather small for the array of dis- 
tributions studied in the present research (e.g., 
the probability of correct identification if the 
Low-Median distributions were judged using 
five categories in accordance with the equal 
discriminability rule would again be approxi- 
mately one third; this also would be increased 
to approximately one half if the five judg- 
ments were made with equal frequencies). 
However, the reduction in ambiguity would 


Rance-Frequency COMPROMISE 43 


be much greater in nonlaboratory situations 
for which stimulus values from certain regions 
of the range occur with extraordinary infre- 
quency. Thus if three categories were applied 
in accordance with the equal discriminability 
rule to a normal distribution, the middle 
category would be used virtually always and 
would thus convey little information about 
which particular value was being judged. In 
many of the social situations in which habits 
of judgment are presumably acquired, there 
should be ample opportunity for the facilita- 
tion of proportionate-frequency habits which 
increase the efficiency of identification. How- 
ever, it is not implied here that such habits 
are verbalized with any precision or that S 
consciously attempts to identify the stimuli 
or even that S always maximizes information. 
Tn fact, stimulus identifications may actually 
be more accurate when S makes unanchored 
absolute judgments than when he is instructed 
to identify the stimuli (demonstrated ex- 
perimentally in Parducci & Marshall, 1962). 

This discussion has assumed that given 
degrees of discriminability or given proba- 
bilities of correct indentification have equal 
importance over the dimension of judgment. 
However, if S were more concerned with 
stimulus differences in one region of stimu- 
lus values or if he were more concerned to 
identify stimuli in that region, the widths 
and frequencies of use of the categories would 
tend to be smaller there. Thus in fitting a 
size (e.g., of shoes), a wealth of finely differ- 
entiated categories might be used for the re- 
gion bordering the proper size (e.g, “just a 
little bit too large”) while a wide range of 
markedly deviate sizes might be included in 
the same category (e.g., “too large”). Ex- 
perimental evidence consistent with this pos- 
sibility has recently been presented by Tajfel 
(1959). 

Consideration of motivational factors, the 
relative value to S of different stimuli or re- 
gions, would complicate many applications of 
the range-frequency compromise. Successful 
prediction of judgment from stimulus condi- 
tions is also complicated by the phenomena 
of perceptual grouping. Perceptual grouping 
has been found in association with various 
features of the physical arrangement of si- 
multaneously presented lines (Parducci & 


Marshall, 1961b) and also with marked in- 
equalities in successive stimulus differences, 
as in the Low- and High-Mean (gap) condi- 
tions of the present research, Stimuli which 
appear to “go together” are likely to be 
placed in the same category. This sometimes 
results in category limens and judgments 
which differ significantly from the values 
which would be predicted from a range-fre- 
quency compromise, However, the judgmental 
effects of grouping seem consistent with the 
hypothesized function of the proportionate- 
frequency tendency—i.e., to identify the pre- 
sented stimulus or, in this case, the percep- 
tual group to which it belongs. 


Implications and Applications 


Perception versus Judgment. Do the con- 
text effects described in this report represent 
shifts in perception (i.e., the same stimulus 
appearing differently under different condi- 
tions), or do they merely represent shifts in 
the standards with which the stimuli are 
judged? A number of psychologists have been 
concerned with distinguishing between percep- 
tion and judgment (Garner, Hake, & Eriksen, 
1956; Hochberg, 1956; Krantz & Campbell, 
1961; Pratt, 1950; Stevens, 1958), but it 
has proven difficult to specify nonphenomeno- 
logical criteria for the distinction. Helson 
treats shifts in adaptation level as sensory, 
citing his S’s report that the lifting of a given 
weight was more fatiguing when adaptation 
level was lower (Helson, 1947, p. 12); and 
one of the triumphs of Helson’s theoretical 
analysis of judgment has been the demonstra- 
tion that context effects for absolute judg- 
ment (which often seem to be relatively 
verbal or semantic) can be described with 
virtually the same equations which describe 
the context effects for comparative judgment 
(which are usually assumed to represent per- 
ceptual shifts). 

In a previous discussion of comparative 
versus absolute judgments (Parducci & Mar- 
shall, 1962), it appeared more fruitful to 
order the experimental results with respect 
to degree of anchoring rather than in terms 
of a dichotomy between perception and judg- 
ment, Among the present judgments, the 
estimates of the actual number of dots (Fig- 


44 Parpucct 


ure 12) were the best anchored, in the sense 
of being most free of context effects (mid- 
point or median). When instructed to anchor 
their responses to a fundamental physical 
operation (counting), Ss apply the same re- 
sponse to a given stimulus, irrespective of 
the distribution of stimuli in which it appears. 
Should we then say that the distributions do 
not effect the way the stimulus is perceived? 
This would appear to agree with common 
usage of the term “perception.” However, 
the invariance of the physical estimates might 
simply reflect S’s ability to counteract in his 
judgments the effects of context upon per- 
ception. Thus, common usage treats percep- 
tion as an intervening variable which is only 
sometimes reflected by a particular dependent 
variable, the perceptual judgment. A phe- 
nomenological orientation forces this di- 
chotomy, the perception representing sub- 
jective experience. Within the framework of 
behaviorism, however, the estimates of the 
number of dots are treated simply as re- 
sponses elicited by the stimuli and by instruc- 
tions which specify (by implication) the 
operations for determining how each numeral 
is to be used. 

None of the other sets of instructions pro- 
vided such specification of the rules for judg- 
ment. However, all of the instructions speci- 
fied the relationship of order between the 
stimulus and response dimensions, and they 
also identified the response for the “average” 
stimulus (or the background size for back- 
ground-1,000). The behaviorist’s concern 
here is not with whether the stimuli judged 
average in the respective distributions were 
perceived as similar but rather with the stimu- 
lus conditions which elicit this average judg- 
ment, These clearly include some of the 
features of the stimulus distributions ma- 
nipulated in the present research. The effects 
of context upon this judgment and upon the 
rest of the scale of judgment are important 
insofar as they illuminate general principles 
of judgment. And the question of whether 
such judgments meet phenomenological cri- 
teria for perception could not be answered 
by research of the type reported here. 

Magnitude Estimations. Stevens and Ga- 
lanter (1957) have presented evidence that 


category scales for stimulus dimensions of 
the type used in the present study tend to 
be logarithmically related to ratio scales as 
represented by Stevens’ method of magni- 
tude estimation. Although category scales are 
greatly affected by the spacing and rela- 
tive frequencies of the stimuli, Stevens and 
Galanter asserted that ratio scales are rela- 
tively free of such context effects. However, 
Garner (1958) has used his own work on half 
loudness judgments with the method of con- 
stant stimulus differences as evidence that 
ratio scales are peculiarly subject to context 
effects, Garner also showed that certain con- 
text effects obtained with absolute or category 
judgments appeared to be eliminated or 
greatly reduced by discriminability scaling. 

The instructions for the present 0-centered, 
1,000-centered, and background-1,000 condi- 
tions provide fairly close approximations to 
Stevens’ instructions for magnitude estima- 
tions (as given in Stevens, 1956). The only 
clear difference is that Stevens’ instructions 
use specific examples of ratios, so that if the 
standard were 10 and the variable stimulus 
sounded three times as loud, it should be 
judged 30. The choice of how the number 
system is to be used (either to represent 
ratios or distances or something else!) is 
left open for the present O-centered, 1,000- 
centered, and background-1,000 conditions. 

The numerical ratios could hardly repre- 
sent ratios of psychological magnitudes with 
the O-centered scales since either the nu- 
merator or denominator is zero.‘ However, 
Ss were completely free to make a Stevens- 
type use of the numbers when the scale was 
either centered or anchored at 1,000. Com- 
parison between the scales is complicated by 
the extreme variance in range of numerals 
used by different Ss with the magnitude- 
estimation scales. However, if the judgments 
are rescaled to equate for range or if the 
median instead of the mean judgment is 
plotted, the resulting scales for the 0-centered, 
the 1,000-centered, and the five-category con- 
ditions are remarkably similar for the same 


4 This implication for the use of 0-centered scales 
was pointed out by F. N. Jones who has recently 
presented evidence for context effects with magni- 
tude estimations (Jones & Singer, 1961). 


Rance-Frequency COMPROMISE 45 


distributions of stimuli (Figures 21, 22, and 
23); and the plot of the five-category scale 
against the 1,000-centered scale is linear 
rather than concave downward. This linearity 
seems contrary to Stevens’ generalization 
about the relationship between category and 
ratio scales. Furthermore, adaptation level 
varies directly with midpoint for even the 
1,000-centered scales which is consistent with 
Garner’s position that ratio scales are highly 
susceptible to context effects. Stevens could 
still maintain that stimulus context affects 
primarily the choice of, or memory for, a 
standard (or modulus). When the present 
instructions request S to call the average 
stimulus either “0” or “1,000,” they really 
call for a category judgment before the mag- 
nitude estimations. If a specific value for the 
standard were explicitly provided or if S were 
completely free to select his own standard, 
the effects of stimulus context would be 
smaller, Garner’s finding that discrimina- 
bility scaling reduces the effects of context 
receives no corroboration from the present 
research in which the discriminability scale 
values also vary markedly with shifts in the 
stimulus frequencies (Figure 20). 

A specific standard was supplied for the 
background-1,000 conditions, a standard 
which was always present and thus particu- 
larly resistant to context effects. However, 
the curvilinearity of the judgment functions 
for the background-1,000 conditions, varying 
systematically with the median of the stimu- 
lus distribution, provides evidence that sig- 
nificant proportionate-frequency effects occur 
even when the scale is so firmly anchored. 
And these context effects cannot be attributed 
to a prior category judgment which fixes one 
value of the scale. The effects here are to 
squeeze and stretch different portions of S’s 
scale of magnitude estimations, in accordance 
with the proportionate-frequency hypothesis 
as applied to category judgments. 

In spite of the various context effects, the 
plots relating judgments to stimulus number 
(or to the discriminability scale values for 
the Rectangular distribution) appear to re- 
flect departures from a basic function which 
is rather linear—especially for the various 
midpoint conditions. Even the 1,000-cen- 


tered judgments are linear against the dis- 
criminability scale values, and this would 
appear to be inconsistent with Steven’s power 
function. However, the relationships are al- 
most as linear for log-log plots of the 1,000- 
centered judgments against the actual physi- 
cal values of the stimuli so that the loga- 
rithmic range of the stimulus values for this 
research is not great enough to provide a 
clear test between the log and power func- 
tions. These log-log plots do not appear to 
be as linear as the plots of the estimations 
of the actual number of dots (Figure 12). 
It seems likely that Stevens’ instructions, with 
their specific ratio examples, encourage S to 
describe a single physical ratio for each pair 
of stimuli—as with a physical scale on which 
the numbers represent multiples of some 
standard unit. The proportionate-width and 
proportionate-frequency tendencies postulated 
in the present paper would only partially 
serve this function, and the present context 
effects would be absent insofar as S can use 
a physical scale as the standard for judgment. 

The question of whether Stevens’ ratio 
scaling provides a more valid measure of sen- 
sation requires specification of the criteria 
for validation. The present research indicates 
that stimuli may be scaled in the same way 
irrespective of whether the instructions for 
judgment specify verbal categories or instead 
permit the use of unrestricted numerical 
scales, anchored only at a single stimulus 
value. It seems clear that the notion of cate- 
gory scaling must be broadened to include 
procedures quite similar to those of Stevens’ 
method of magnitude estimation. It would 
also appear that S uses his judgments in ac- 
cordance with the range-frequency compro- 
mise even though free to use as many cate- 
gories or numbers as seem appropriate—and 
even though free to use the numbers to de- 
scribe ratios of sensation. 

Generalizability of the Conclusions. Much 
of the interest in psychological principles of 
judgment is due to their apparent relevance 
to everyday life. The concept of the frame 
of reference, the belief that perceptual judg- 
ments depend upon relationships between 
various past and present stimuli, applies 
whenever man is characterizing his environ- 


46 PaRDUCCI 


ment. How applicable is the range-frequency 
compromise to the wide variety of situations 
for which the frame-of-reference concept has 
seemed relevant? 

The evidence for the range-frequency com- 
promise is based upon research with simple 
physical dimensions, visual length, size, nu- 
merousness, lifted weight, and also upon judg- 
ments of numerical magnitude. Although nu- 
merical magnitude appears more abstract than 
the psychophysical dimensions, the scaling of 
the stimuli (numerals), is even simpler (i.e., 
linear). Would the same context effects be 
obtained with stimuli whose physical differ- 
ences were much more complex? Insofar as 
it is easier to discriminate and remember stim- 
uli which vary with respect to more than 
a single physical dimension, the role of the 
stimulus extremes, the end values, might 
actually be enhanced. Readjustment was 
rapid following restrictions of the range for 
lifted weights in the present research. But if 
S had been permitted to see each weight as 
it was lifted and if the weights had been 
colored or otherwise visually differentiated to 
facilitate memory of the lightest and heaviest 
values, it seems likely that there would have 
been much less postshift readjustment of the 
scale of judgment. The extreme situations of 
nonlaboratory life (e.g., traumatic or ecstatic 
events) may be relatively enduring features 
of the stimulus context for day-to-day judg- 
ments. 

Working against this opportunity for bet- 
ter discrimination and retention of complex 
stimuli is a recency factor which favors the 
retention of the materials exposed in a single 
laboratory session, Although significant se- 
quential effects were not obtained in the pres- 
ent research (except for one minor excep- 
tion), they would perhaps have been much 
more prominent if the interpresentation inter- 
val had been hours or days instead of seconds. 
This should be demonstrable in the laboratory 
through the use of longer intervals, especially 
if Ss were required to perform other tasks 
between the stimulus presentations. Elimina- 
tion of the constant background around the 
squares in the present situation might also 
have enhanced sequential effects. Shifts in 
the remembered values or PSEs for the end 


stimuli are systematic, toward the values of 
the other stimuli presented for judgment. 
Although this central tendency effect has fre- 
quently been demonstrated for judgments of 
psychophysical materials, it may be considera- 
bly greater for the widely spaced value judg- 
ments of everyday life. 

Another characteristic difference between 
laboratory situations and the everyday occa- 
sions for judgments is that the former usually 
require S to explicity judge each stimulus 
when it is presented. The sequence of stimulus 
events in everyday life rarely elicits overt 
judgments, almost never for each successive 
stimulus in a long repetitive series. Although 
the use of incidental-exposure procedures for 
psychophysical judgments has produced typi- 
cal context effects, the influence of inciden- 
tally exposed stimuli appears to be more tran- 
sitory (Parducci, 1956b). One might expect 
the differences between the effects of inciden- 
tal exposure and the present experimental 
procedure to be greatest with respect to the 
proportionate-frequency tendency. Insofar as 
S does not make overt judgments, an occa- 
sional judgment should be less likely to reflect 
the proportionate-frequency tendency. 

The instructions for judgment for most of 
the present conditions prescribed a set of 
verbal categories, adjectives sometimes modi- 
fied by adverbs. However, the judgments 
were usually recorded by S in numerical form 
(Very small—i, Small—2, etc.). Similar 
scales were obtained when the weight lifting 
Ss announced their verbal categories aloud 
and also for the various column recording 
conditions which required S to record the 
presentation number in a column headed by 
the appropriate verbal category. The 0-cen- 
tered, 1,000-centered, and background-1,000 
magnitude-estimation scales also yield scales 
of judgment which closely resemble those 
obtained using the more extensively anchored 
verbal categories. The generality of these 
scales has thus been given considerable test in 
the present research. Would these adverbial 
modifiers (very, slightly) yield scales of the 
same form if combined with different adjec- 
tives (good, beautiful, schizophrenic) than the 
ones used in the present study (small, heavy, 
few)? The recent work of Cliff (1959) sug- 


RANGE-FREQUENCY COMPROMISE 47 


gests that they would. Using discriminability 
scaling of semantic differential ratings of ad- 
verb-adjective combinations, Cliff had impres- 
sive success with a model which assigned a 
scale value to each adverb and also to each 
adjective, the scale value of their combina- 
tion being equal to the product of their sepa- 
rate scale values. This stability in the modi- 
fying effect of adverbs encourages the 
assumption that the rules of judgment, cate- 
gory spacings and relative frequencies of use, 
may also be stable across different dimensions 
of judgment. 

As used in the present research, the method 
of single stimuli need not explicitly anchor 
any of the categories of judgment to specific 
stimulus values. There may be some implicit 
anchoring of the end categories to the stimu- 
lus extremes but the midpoint effects are ob- 
tained even when Ss use the unrestricted mag- 
nitude-estimation scales, centered at either 0 
or 1,000, with no specified end categories. 
However, anchoring is a prominent feature of 
everyday value judgments where education 
or social influence tends to fix terms like 
“good,” “middle of the road,” etc., to spe- 
cific stimulus events. A common laboratory 
procedure for anchoring uses the method of 
constant stimulus differences, the “equal” 
category or middle limen being anchored to 
the standard on alternate presentations. This 
method was used by Harris (1948) in a 
study of the context effects for comparative 
judgments of the pitch of pure tones. Besides 
confirming the predominant influence of the 
comparison values upon the PSE (similarly 
demonstrated by Doughty, 1949), Harris 
(1948) found that independent variation of 
the mean of these values had little effect 
upon PSE. On this basis, he suggested that 
the effective standard for comparative judg- 
ment “is a function of the series median, or 
possibly the midpoint of a range, rather than 
of a complicated mean computation {p.317].” 
In view of the similarity of the central 
tendency effects obtained for absolute and 
comparative procedures (Parducci & Marshall, 
1962), and also of the similarity between the 
unanchored and the well-anchored back- 
ground-1,000 scales of the -present research, 
it seems reasonable to hypothesize that the 


range-frequency compromise may also be use- 
ful for describing context effects in the more 
typically anchored situations of everyday 
experience. 

However, applications to social judgment 
may be limited in important ways by the 
hortatory function of value judgments. The 
present research simply required Ss to de- 
scribe the stimuli. It seems unlikely that Ss 
were using their judgments to modify opinions 
or values. But in many social situations, 
judgments have the primary purpose of chang- 
ing attitudes. Thus, Sherif and Hovland 
(1961) find that a relatively wide range of 
political positions get lumped together under 
one category, rejection, when these positions 
differ markedly from the position taken by 
the judge. It is not that the judge has greater 
difficulty discriminating these extreme posi- 
tions since the lumping effect is largely elimi- 
nated by discriminability scaling. Rather, it 
would appear that these positions are placed in 
the same category so that the disapprobation 
already attached to the most extreme positions 
will generalize to less extreme positions which 
are also rejected by the judge. Insofar as 
judgments are used to influence opinions in 
this way, the simple principles of perceptual 
judgment studied in the present research will 
not be adequate to explain value judgments. 

Further Theoretical Development. The 
special appeal of the theory of adaptation 
level has been that it provides in the 
weighted-mean equation a quantitative frame- 
work for describing the frame of reference. 
The postulation of the range-frequency com- 
promise, in terms of which the data from 
the present research are interpreted, raises 
difficulties for the weighted-mean equation 
and the concept of adaptation level as the 
reference point for the scale of judgment. 
No specific model has been presented for 
quantitatively describing the range-frequency 
compromise. The simple linear compromise, 


adaptation level = .55 (Midpoint) 
+ 45 (Median) 


worked reasonably well for judgments of 
numerical magnitude (Parducci et al., 1960), 
but no attempt was made to predict other 
judgments from adaptation level. The form 
of the judgment scale is of major concern 


48 Parbuccr 


since the ultimate goal must be to under- 
stand the basis for specific responses over the 
entire scale of judgment. The present re- 
search indicates that adaptation level is 
sometimes a poor index to other portions of 
the scale of judgment (i.e., the scale may 
take peculiar, nonlinear forms), Although 
the range-frequency compromise appears to 
operate over the entire scale, the present state- 
ment of the compromise yields only qualita- 
tive or directional predictions. The general 
notions concerning the nature of the compro- 
mise need considerable sharpening. Mathe- 
matical models could be developed to provide 
more powerful tests of these notions. Perhaps 
these would dictate the selection of simpler 
experimental situations, with fewer stimuli, 
fewer judgments, and more explicit anchor- 
ing. Somewhat different models may be re- 
quired for the different kinds of anchoring 
to which adaptation level theory has been 
applied. However, the evidence for some kind 
of general range-frequency compromise now 
appears strong enough to warrant serious at- 
tempts toward a more powerful theoretical 
treatment of this approach to the explanation 
of context effects in judgment. 


SuMMARY 


This research was designed to investigate 
implications of an approach to absolute judg- 
ment which treats the scale of judgment as a 
compromise between the tendency to divide 
the range of stimuli into proportionate sub- 
ranges and the tendency to use the alterna- 
tive categories of judgment with proportionate 
frequencies, 

Independent Variables. The primary con- 
cern was with the effects of independent 
manipulation of the midpoint (mean of the 
two end values), median, and mean values 
of the stimuli presented for judgment. Each 
of these three parameters of the stimulus dis- 
tribution was independently varied for each 
of three stimulus dimensions: numerousness 
of dots, size of squares, and heaviness of lifted 
weights. Most of the research used the 
method of single stimuli, with successive 
presentation of the stimuli. Different orders 
or sequences of presentation were used for 
each set of stimuli. Simultaneous presenta- 

g 


tion was also used for the judgments of nu- 
merousness. Each S$ was run under only one 
condition except that the effects of shifting 
the stimulus distribution were also investi- 
gated with some Ss judging weights. 

Most of the conditions were studied using 
two alternative methods for recording the 
responses: either randomized in the order of 
presentation or segregated by categories into 
separate columns so as to provide a simple 
visual record of the cumulated frequencies of 
category usage. Under different conditions, 
Ss were restricted to either two, four, five, 
or six categories of judgment. For some con- 
ditions, Ss were free to formulate their own 
numerical scales, restricted only with respect 
to the “average” stimulus value or the value 
of the background stimulus. Estimations were 
also obtained of the actual number of dots in 
each pattern under different distribution con- 
ditions. 

Dependent Variables. The adaptation level 
defined as the mean of the stimuli to which 
the middle category was applied (or the mid- 
dle limen) was the major dependent variable 
obtained for every experimental condition. 
Scales of judgment were also presented to 
show the mean judgment of each stimulus, the 
mean of the stimulus values to which each 
category was applied, and the values on 
Thurstone scales of equal discriminability. 


Conclusions. The postulated tendency to 
divide the range into proportionate subranges 
was supported by significant midpoint effects, 
adaptation level varying directly with vari- 
ation in the midpoint for all stimulus di- 
mensions and methods of recording. The 
quasi-logarithmic relationship obtained be- 
tween the judgments and the physical values 
of the stimuli was interpreted as consistent 
with the proportionate-subrange principle, 
successive category limens marking off equally 
discriminable differences on the stimulus di- 
mensions, The greater shift in judgment fol- 
lowing extension than following restriction of 
the range of stimuli also appeared consistent 
with this principle. 

The postulated proportionate-frequency 
tendency was supported by the demonstration 
that adaptation level varied directly with vari- 
ation in the median for all stimulus dimen- 


Rance-Frequency COMPROMISE 49 


sions and methods of recording. Deviations 
from the quasi-logarithmic form of the mean 
judgment functions also supported the pro- 
portionate-frequency principle in that hypo- 
thetical fixed-frequency functions, represent- 
ing what these functions would have been had 
the categories been used with the same fre- 
quencies as they had been used for the Rec- 
tangular distributions, exaggerated the ob- 
tained deviations (even for equal discrimin- 
ability scaling by the method of successive 
intervals), It also appeared that these devia- 
tions would have been greater were it not for 
the proportionate-subrange tendency. The ef- 
fects of the method of recording the judg- 
ments were not great, but these also were 
consistent with the proportionate-frequency 
tendency. 

Independent variation in the mean, con- 
trary to predictions from previous applications 
of the theory of adaptation level, had no 
consistent effect upon adaptation level. How- 
ever, the research on the role of the mean 
revealed a tendency to place category limens 
at gaps in the stimulus distribution (i.e., 
where the steps between successive stimulus 
values are most discriminable). This tendency 
and also the proportionate-subrange and pro- 
portionate-frequency tendencies appeared con- 
sistent with a functional interpretation of 
judgment, the task for judgment being to 
efficiently identify the different stimuli or 
classes of stimuli and to describe the im- 
portance to S of the successive stimulus dif- 
ferences. 


The effects of varying the instructions for 
the categories of judgment (restrictions to a 
small set of categories versus unrestricted nu- 
merical scales, fixed only at a single stimulus 
value) did not interact with either the median 
or midpoint effects. Different portions of the 
response scale appear to be used with fixed 
relative frequencies, and even the magnitude- 
estimation scales are used to subdivide the 
range of stimuli into equally discriminable 
steps. Equal differences on the response 
scale tend to correspond to equal ratios be- 
tween the physical values of the stimuli. The 
power function asserted for ratio judgments 
appears most clearly when S is instructed to 
describe the physical values of the stimuli. 

The results provide a more extensive em- 
pirical basis for the compromise theory of 
judgment which had previously been sup- 
ported by research with relatively abstract 
stimulus dimensions and simultaneous rather 
than successive exposure of the stimuli. While 
generally consistent with the adaptation level 
approach, the results raise the following diffi- 
culties: (a) the end values must be weighted 
more heavily than other stimuli in the dis- 
tribution presented for judgment, (b) the 
mean provides only a crude first approxima- 
tion for the prediction of adaptation level 
or specific judgments, (c) adaptation level 
is sometimes an unrepresentative index to 
the scale of judgment, and (d) the basic 
adaptation level equations do not provide an 
adequate description of the tendency to use 
different portions of the response scale with 
fixed relative frequencies. 


REFERENCES 


BEEBE-CENTER, J. G. The psychology of pleasantness 
and unpleasantness. New York: Van Nostrand, 
1932. 

CAMPBELL, D. T., Hunt, W. A., & Lewis, N. A. The 
effects of assimilation and contrast in judgments of 
clinical materials. Amer. J. Psychol., 1957 70, 347— 
360. 

Curr, N. Adverbs as multipliers. Psychol. Rev., 1959 
66, 27-44. 

Doucury, J. M. The effect of psychophysical method 
and context on pitch and loudness functions. J. 
exp. psychol., 1949, 39, 729-745. 

Epwarns, A. L., & Tuurstone, L. L. An internal 
consistency check for scale values by the method 


of successive intervals. Psychometrika, 1952, 17, 
169-180. 

Fisuer, R. A., & Yates, F. Statistical tables for bio- 
logical, agricultural, and medical research. (5th ed.) 
New York: Hafner, 1957. 

Garner, W. R. Advantages of the discriminability 
criterion for a loudness scale. J. Acoust, Soc. Amer., 
1958, 30, 1005-1012. 

Garner, W. R., & Haxe, H. W. The amount of in- 
formation in absolute judgments, Psychol. Rev., 
1951, 58, 446-459. 

Garner, W. R, Haxe, H. W., & Eriksen, C. W. 
Operationism and the concept of perception. Psy- 
chol. Rev., 1956, 63, 149-159. 


50 PaRDUCCI 


Gurrorp, J. P. Psychometric methods. New York: 
McGraw-Hill, 1954. 

Harris, J. D. Discrimination of pitch: Suggestions 
toward method and procedure. Amer. J. Psychol., 
1948, 61, 309-322. 

Hetson, H. Fundamental problems in color vision: 
I. The principle governing changes in hue, satura- 
tion, and lightness of non-selective samples in 
chromatic illumination. J. exp. Psychol., 1938, 23, 
439-476. 

Hetson, H. Adaptation-level as frame of reference for 
prediction of psychophysical data. Amer. J. Psy- 
chol, 1947, 60, 1-29. 

Hetson, H. Adaptation level theory. In S. Koch 
(Ed.), Psychology: A study of a science. Vol. 1. 
Sensory, perceptual, and physiological formulations. 
New York: McGraw-Hill, 1959. Pp. 565-621. 

HocusercG, J. Perception: Toward the recovery of 

a definition. Psychol. Rev., 1956, 63, 400-405. 

Hoxirvcworrn, H. L. The central tendency of judg- 
ment. J. Phil., Psychol. scient. Meth., 1910, 7, 461- 
468. 

Jounson, D. M. Generalization of a scale of values 
by the averaging of practice effects. J. exp. Psy- 
chol., 1944, 34, 425-436. 

Jounson, D. M. Learning function for a change in 
the scale of judgment. J. exp. Psychol., 1949, 39, 
851-860. 

Jounson, D. M. The psychology of thought and 
judgment. New York: Harper, 1955. 

Jones, F. N., & Smycer, D. Context effects in mag- 
nitude estimation. Paper read at Western Psycho- 
logical Association, Seattle, June 1961. 

Krantz, D. L., & CAMPBELL, D. T. Separating per- 
ceptual and linguistic effects of context shifts upon 
absolute judgments. J. exp. Psychol, 1961, 62, 
35-42. 

McGarvey, H. R. Anchoring effects in the absolute 
judgment of verbal materials. Arch. Psychol, N. Y., 
1943, 39, No. 281. 

ParpuccI, A. Learning variables in the judgment of 
single stimuli. J. exp. Psychol., 1954, 48, 24-30. 

Parducci, A. Direction of shift in the judgment of 
single stimuli. J. exp. Psychol., 1956, 51, 169-178. 
(a) 

Parpuccr, A. Incidental learning of stimulus fre- 
quencies in the establishment of judgment scales. 
J. exp. Psychol., 1956, 52, 112-118. (b) 


Parpuccr, A. An adaptation-level analysis of ordinal 
effects in judgment. J. exp. Psychol., 1959, 58, 239- 
246. 

Parducci, A., CaLree, R. C., MARSHALL, L. M. & 
Davmson, L. P. Context effects in judgment: 
Adaptation level as a function of the mean, mid- 
point, and median of stimuli. J. exp. Psychol., 
1960, 60, 65-77. 

Parducci, A., & Honre, R. Restriction of range in the 
judgment of single stimuli. Amer. J. Psychol., 1957, 
70, 272-275. 

Parpucct, A., & MARSHALL, L. M. Supplementary re- 
port: The effects of the mean, midpoint, and me- 
dian upon adaptation level in judgment. J. exp. 
Psychol., 1961, 61, 261-262. (a) 

Parpuccr, A., & MarsHart, L. M. Context effects in 
judgments of length. Amer. J. Psychol., 1961, 74, 
576-583. (b) 

Parpucctr, A., & MarsHatt, L. M. Assimilation vs. 
contrast in the anchoring of perceptual judgments 
of weight. J. exp. Psychol., 1962, 63, 426-437. 

Pratt, C. C. The role of past experience in visual 
perception. J. Psychol., 1950, 30, 85-107. 

Smer, M., Taus, D., & Hovranp, C. I. Assimila- 
Haven: Yale Univer. Press, 1961. 

Suerr, M., Tavs, D., & Hovranp, C. I, Assimila- 
tion and contrast effects of anchoring stimuli on 
judgments. J. exp. Psychol., 1958, 55, 150-155. 

Srevens, S. S. The direct estimation of sensory mag- 
nitudes: Loudness. Amer. J. Psychol., 1956, 69, 
1-25. 

Srevens, S. S. Adaptation-level vs. the relativity of 
judgment. Amer. J. Psychol., 1958, 71, 633-646. 
STEVENS, S. S., & Gatanter, E. H. Ratio scales and 
category scales for a dozen perceptual continua. 

J. exp. Psychol., 1957, 54, 377-411. 

TAJFEL, H. The anchoring effects of value in a scale 
of judgments. Brit. J. Psychol., 1959, 50, 294-304. 

Torcerson, W. S. Theory and methods of scaling. 
New York: Wiley, 1958. 

VorxMann, J. Scales of judgment and their impli- 
cations for social psychology. In J. H. Rohrer & 
M. Sherif (Eds.), Social psychology at the cross- 
roads. New York: Harper, 1951. 

Wooprow, H. Weight discrimination with a varying 
standard. Amer. J. Psychol., 1933, 45, 391-416. 


(Received July 18, 1962) 


a 


Vol. 77, No. 3 Whole No. 566, 1963 


Psychological Monographs: General and Applied 


TACTICS OF INGRATIATION AMONG LEADERS AND 
SUBORDINATES IN A STATUS HIERARCHY * 


EDWARD E. JONES, KENNETH J. GERGEN, anp ROBERT G. JONES 


Duke University 


High and low status personnel in a Naval ROTC program were instructed to 
exchange written communications about themselves. Half of these pairs com- 
municated under instructions stressing the importance of mutual attraction; 
half were under instructions emphasizing accuracy. From the communication 
messages it was possible to draw the following conclusions: (a) Conformity— 
low status subjects conformed more than highs as an increasing function of the 
relevance of the issue to the basis of the hierarchy. (b) Self-presentation—high 
status subjects became more modest when under pressure to make themselves 
attractive; low status subjects showed the same tendency on important items 
but became more self-enhancing on unimportant ones. (c) Other enhancement— 
low status subjects were more positive in their public appraisals of the high 


status subjects than vice versa. 


HE present study explored some of the 
éomsequences of varying the importance 

- of compatibility in pairs whose members 
clearly differ in status. Our hope was to shed 
sonie light on the social behavior of leaders 
and followers in task oriented groups. Pre- 
sumably, all such groups face maintenance 
problems as well as problems associated with 
task achievement. This is clearly indicated by 
Homans’ (1950) well-documented distinction 
between the internal and the external systems. 
If the members of a functioning group are 
not at least minimally attracted to each other, 
the strain of interacting in the achievement 
of group goals should in the long run impair 

task performance. 

But how is this mutual attraction main- 
tained when there are clear differences in 
role and status in a group? The research 
literature has thus far concentrated on affec- 


1 This investigation was made possible by a grant 
from the National Science Foundation (NSF-G8857). 
The authors are very indebted to the officers of the 
Duke University Naval ROTC unit for their facili- 
tation of the study. Our special gratitude extends to 
T. R. McCants, W. R. Fisher, Jr., M. E. Shirley, and 
F. J. Wade for their efforts in the recruitment of 
subjects. We also would like to thank Hilda Dickoff 
for her contributions as a “ghost writer” during the 
first year of the study. 


tional relations (or “cohesiveness”) among 
peers, but the development and maintenance 
of personal attraction between leader and fol- 
lower, or between high and low status group 
members, has been less intensively studied. 
In order to gain some insights into this prob- 
lem, the present study focused on communica- 
tions taking place in a quasi-military hier- 
archical dyad. The communications were 
not directly concerned with task performance. 
Status was defined in terms of class. seniority 
within a student ROTC group, with the ex- 
perimenter capitalizing on this difference in 
seniority to assign consonant “commander” 
and “subordinate” roles in the experiment. 
Generally speaking, there would seem to be 
a number of reasons why the lower status 
follower is concerned with the degree to which 
the higher status leader is attracted to him. 
A nearly universal perquisite of leadership 
status is the capacity to control outcomes of 
the follower. In most organizational hier- 
archies, the lower status person is dependent 
on his superordinates for task definition, per- 
formance evaluation, remuneration, oppor- 
tunities for advancement, etc. As Thibaut 
and Kelley (1959) point out, one way in 
which the lower power person can blunt or 
reduce the power which the high status per- 


2 E. E. Jones, K. J. Gercen, anD R. G. Jones 


son actually applies is to become attractive 
to the latter. Cohen (1958), for example, has 
shown how critical comments to the leader 
decrease when dependency is enhanced by 
instructions. 

It perhaps is not quite so obvious why the 
high status leader is concerned with his at- 
tractiveness to the followers. In the most 
ruthless, autocratic organizations, the follow- 
ers may be motivated by fear of punishment 
or controlled by automatically administered 
rewards so that affection for the leader is 
dispensable. In the vast majority of organi- 
zational situations, however, the follower has 
considerable counter power (cf. Thibaut & 
Kelley, 1959). He may use this power by 
brandishing threats to leave the organization, 
or by forming coalitions with other followers 
to restrict output. In the typical case, then, 
the leader who is concerned with group ef- 
fectiveness will also be concerned with the 
loyalty and spontaneous affection of his fol- 
lowers. By earning their positive regard for 
him as a person, the leader may effectively 
neutralize the followers’ counter power and 
more successfully exert control in the direc- 
tion of organizational goals. 

There are, to be sure, additional reasons 
why leaders and followers might wish to se- 
cure each other’s affection. The leader may 
be more favorably evaluated by Ais superiors 
if he can inspire the affection and loyalty of 
his crew. Then too, it seems likely that both 
the leader and the follower import general 
needs to be liked into the organizational situa- 
tion. Thus, the ability to inspire the affection 
of another person may represent a gratifying 
conquest for both subordinate and superior. 

Tt was not the purpose of the present study 
to demonstrate that both leaders and fol- 
lowers are concerned with their attractiveness 
to each other; the purpose was rather to ex- 
plore a few of the major ways in which this 
attraction is sought. The focus was thus on 
the tactics of ingratiation rather than on its 
motivational basis. These tactics are con- 
ceived of as ways of presenting oneself to 
another person. They may or may not in- 
volve conscious, rational decisions. They com- 
prise, in Goffman’s (1959) terms, the arts of 
impression management. By the communica- 


tions Person P addresses to O, he projects 
certain features of himself that he wishes O 
to assimilate, Although there are undoubtedly 
large and consistent individual differences in 
the characteristics which different persons at- 
tempt to present for social consumption, we 
can also expect self-presentations to vary 
markedly as a function of the situation and 
the individual’s role. The design of the pres- 
ent study was developed to test the general 
hypothesis that in a well-defined leadership 
hierarchy, aroused motivation to elicit attrac- 
tion gives rise to different interpersonal tactics 
for leader and follower. 

In normal interpersonal discourse, the par- 
ticipants may communicate to each other with 
reference to each of the types of items de- 
scribed by Heider’s (1946) P-O-X notational 
system. Thus P can speak to O about P’s 
characteristics (direct self-disclosure), about 
O’s characteristics (the appraisal of the 
other), or about some object or event (X) 
external to the relationship. In communi- 
cating about each of these item types, the 
individual may present crucial data about 
himself. Thus the motive to make oneself at- 
tractive to another (here called the motive 
to ingratiate) may achieve expression in vari- 
ous ways. 

In other words, we might say that different 
tactics of ingratiation are involved with each 
referent item in the P-O-X formation. If O 
is the referent, P may convey the impression 
that he is attracted to O and that he thinks 
highly of him. Such a tactic capitalizes on 
the commonly observed “congruency” between 
liking someone and perceiving that he likes 
you (Tagiuri, Bruner, & Blake, 1958). Thus 
it is difficult for O to remain unaffected by 
information that P finds him attractive. This 
will tend to increase O’s attraction for P, 
For convenience we might call this the tactic 
of other enhancement. 

If X is the referent of communication, P 
may attempt to emphasize the fact that he 
and O share the same values and opinions 
about important things. If P is highly moti- 
vated to ingratiate himself with O, and they in 
fact do not share the same opinions, some 
amount of tactical conformity will be required. 

A final alternative concerns P as the refer- 


Tactics OF INGRATIATION ‘ 3 


ent of his own communications. That is, P 
may directly inform O about certain attri- 
butes of himself to enhance the possibility that 
O will find him attractive. It is difficult to 
prescribe the most effective tactic here with- 
out taking many factors into consideration. 
P must strike a balance between boastful 
self-aggrandizement and the kind of self- 
derogation which bespeaks insecurity and a 
disturbing lack of confidence. Blau (1960) 
has commented on this dilemma, suggesting 
that, 
creating a good first impression is a subtle form of 
bragging, but its success depends on its being so 
subtle that it does not appear to be bragging at all 
[p. 547]. 
In discussing, below, the tactics involved in 
selectively presenting one’s own attributes to 
others, we shall use the term self-presentation. 
If we now turn to the specific problem of 
ingratiation in a status hierarchy, there are 
differences between the leader’s and the fol- 
lower’s situations which should have important 
implications for tactical variations in be- 
havior. The low status follower, by virtue of 
his poor power position alone, is likely to be 
highly motivated to create an attractive im- 
pression. He is dependent in many ways on 
a favoring disposition of the more powerful 
high status superordinate. One can appreci- 
ate why this dependence normally gives rise 
to potent ingratiation motives. By the same 
token, however, the dependence is obvious 
enough for the more easily detected ingratia- 
tion tactics to be avoided as they provide 
clear evidence of manipulative intent. The 
low status subordinate must succeed in “man- 
aging” an attractive impression without run- 
ning the risk of being called a “yes man” or 
a sycophant. Of the three tactics mentioned 
above, there are reasons why conformity 
might seem to be the most appropriate for the 
typical subordinate. Conformity to the 
opinions of the leader is effective because it 
is difficult to discriminate between conformity 
and genuine attitude similarity; opinion agree- 
ment bolsters the validity of the leader’s views 
_ without raising obivious questions about de- 
vious intentions. The high status recipient of 
agreement is not likely to suspect its tactical 
origin because, from his perspective, it is 


gratifying but hardly surprising when people 
believe what is “correct.” 

The other enhancement tactic seems less 
appropriate for the low status person because 
his evaluations are based on standards of un- 
known validity (he may be perceived as com- 
paring the leader’s attributes to those of low 
status people like himself), and because the 
use of direct compliments is such an obvious 
tactic and one which can be exercised at low 
emotional or intellectual cost. Nor are tactics 
involving modesty of self-presentation likely 
to play an important role for the low status 
person. As Blau (1960, p. 550) argues, if a per- 
son is not at all “impressive” to begin with, 
self-deprecation can only embarrass others 
and tends to make the unattractive person 
even less attractive. On the other hand, the 
dangers of publicly overevaluating the self 
are obvious. The low status person is prob- 
ably better off avoiding the tactical use of self- 
enhancement or self-deprecation in his efforts 
to elicit attraction. 

Turning to the high status or superordi- 
nate person, the strategic situation seems quite 
different. For one thing, while we have con- 
tended that the superordinate is normally mo- 
tivated to enlist the sympathy and liking of 
the subordinate, he is also involved in main- 
taining the subordinate’s respect for his task 
competence, his integrity, and his dedication to 
the organization they both represent. Many 
of the tactics of ingratiation described above 
would be incompatible with this maintenance 
of respect. He must win the subordinate’s 
support, which involves certain elements at 
least of affectional attraction, without under- 
mining his own respectability and power. In 
surveying the three available tactics of ingra- 
tiation, the tactic of conformity seems most 
vulnerable to these considerations. At some 
point in his interactions with the subordinate, 
the high status leader must demonstrate his 
capacity to form independent judgments in 
areas where his experience and his role render 
him likely to be more competent than the sub- 
ordinate. While the leader may seek out 
opinion issues on which he can safely agree 
with his subordinates, he has much to lose if 
his conformity is indiscriminate, Also, the 
leader who adopts the tactic of conformity 


4 E. E. Jones, K. J. Gercrn, anp R. G. Jones 


soon finds that he cannot agree with all of his 
subordinates, unless they agree among them- 
selves. 

In many leadership contexts, however, the 
more direct tactic of other enhancement may 
commend itself to the high status person. To 
evaluate someone positively to his face im- 
plies that you are in a position to pass judg- 
ment—a consideration which is in line with 
the status differential involved. The tactic of 
distorting one’s evaluations in the positive di- 
rection when they are made public may in- 
crease the subordinate’s loyalty and affection 
without reducing the necessary social distance 
between leader and follower. Even if the sub- 
ordinate perceives that the evaluation is over- 
drawn and unreasonably positive, he is likely 
to place a benign cast on the leader’s motiva- 
tion, and to see him as acting for the good 
of the organization (to improve morale) 
rather than for obvious personal gain. 

The leader is likely to be especially con- 
cerned with effective self-presentation in his 
communications to the subordinate. Blau 
(1960), for example, feels that the high status 
person faces the problem of impressing oth- 
ers without losing their affection for him 
(though he is somewhat uncertain about the 
importance of this affection as long as the 
leader’s talents are highly needed). The more 
impressive a person becomes, the more unap- 
proachable he becomes (p. 547) and the more 
difficult it is to initiate social interchanges with 
him. The tendency for respect and liking to 
be inversely correlated—at least as respect 
implies high impressiveness—is sometimes 
handled by the sharing of leadership roles be- 
tween a task leader and a social-emotional 
leader (cf. Bales, 1958). When this is not 
possible, however, the high status person must 
find ways to demonstrate his approachability 
without at the same time destroying his im- 
pressiveness or respectability. As Blau implies, 
he may do this by (a) emphasizing such 
shared characteristics as ethnic background, 
interest in the sports news, etc.; and/or (b) 
by presenting himself in a self-deprecating 
manner. But the self-deprecation cannot be 
indiscriminate. The high status person must 
not deprecate himself on those characteris- 
tics central to his status. This would serve 


only to undermine the basis of the subordi- 
nate’s respect for him. He must demonstrate 
his approachability by acknowledging actual 
or alleged defects on nonsalient, unimportant 
attributes, For the high status person, then, 
an appropriate tactic of ingratiation (or ap- 
proachability demonstration) involves a pat- 
tern of self-presentation wherein important 
positive traits are readily acknowledged along 
with an emphasis on weaknesses in nonessen- 
tial areas. 

The preceding hunches are not the sort of 
stuff from which precise hypotheses can be 
confidently derived. They did help to shape 
the experimental situation described below, 
however, and alerted us to certain promising 
lines of data analysis. In planning the experi- 
ment, a situation was devised so that at vari- 
ous points in the procedure high and low 
status pair members communicated to each 
other about opinion issues (X), about the 
characteristics of the other person (O), and 
about the self (P). The conditions of com- 
munication were carefully controlled. In 
order to arouse motives to ingratiate, sub- 
jects during the first year of the study were 
instructed concerning the vital importance of 
mutual compatibility. In an attempt to pro- 
vide a control comparison with subjects 
communicating under low ingratiation incen- 
tives, different subjects during the second 
year were urged to be themselves and to avoid 
misleading the other person about their true 
nature. Given such settings, it was possible to 
investigate whether: 


1. Relative to high status subjects, low 
status subjects show a greater tendency to 
conform on opinion issues. This tendency 
toward differential conformity should be 
especially pronounced when instructions have 
emphasized compatibility and when the issues 
being discussed are relevant to the basis of 
the status hierarchy. 

2. In presenting their self-ratings to the 
other person, high status subjects under in- 
structions emphasizing compatibility show a 
greater tendency to deprecate themselves on 
nonimportant versus 
than low status subjects. Without the com- 
patibility instructions the difference between 
high and low status subjects should be smaller. 


important attributes “ 


Tactics oF INGRATIATION 5 


3. When invited to transmit to another 
person their impressions of him, low status 
subjects show a greater tendency to inhibit 
overt flattery of high status ones than do high 
status subjects of low status ones. This 
should be especially the case given high com- 
patibility incentives, 


METHOD 
Subjects 


The subjects were 79 undergraduate male volun- 
teers from the Naval ROTC unit at Duke University. 
As indicated above, the experiment was conducted 
over a 2-year period. During the first year (ingratia- 
tion condition), the low status (LS) group was com- 
posed of 21 students in the freshman class, whereas 
the high status (HS) group consisted of 10 seniors 
and 9 juniors. Four subjects (3 HS and 1 LS) were 
not included in the data analysis because of their 
suspicion of the experimental procedure. During the 
second year (control condition) there were 20 fresh- 
men in the LS position, and 8 seniors and 11 juniors 
in the HS position. No subjects were discarded for 
suspicion during the second year, but on a few 
occasions a confederate substituted for a missing 
volunteer. 

Subjects participated in the experiment in groups 
of four; each group was composed of two subjects 
from each status level. Prior acquaintance between 
HS and LS subjects was rare, and since no subject 
was aware of the identity of his actual partner, it is 
hard to see how acquaintance could play a role, 


Instructions 


Ingratiation Condition. During the first year, the 
experimenter introduced the study as one concerned 
with testing naval leadership potential. More specifi- 
cally, subjects were told that previous attempts to 
develop such tests in real-life settings had foundered 
because commanders and subordinates had not always 
been initially compatible. The purpose of this study 
was allegedly to find out if “compatible groups pro- 
vide a better setting in which to test leadership 
potential than do incompatible groups.” In order to 
answer this question, subjects were told that leader- 
ship tests would be given during drill periods later 
in the year: 


In these tests, we are going to observe different 
two-person groups. Some of these will be com- 
patible and some will be incompatible. Each test 
will involve one commander (in other words, an 
upper classman) and one subordinate (a freshman). 
Today we are going to make up two commander- 
subordinate pairs simply by putting one upper 
classman and one freshman together, and we are 
trying to make a determination of the degree to 
which each pair is compatible. After forming the 
pairs, in other words, we want to find out whether 


the commander ends up thinking highly of the 
subordinate and whether the subordinate ends up 
liking and respecting the commander. 


In order supposedly to control for factors associated 
with physical appearance, it was explained that each 
subject would communicate from a private booth to 
the other member of his pair without knowing the 
exact identity of this member, 

In order to increase the incentive to be compatible 
each of the four subjects was asked to identify him- 
self by name before being ushered to the booths, and 
each was then asked to write down the name of the 
person in the other status level he would most like 
to have as his partner in the experiment, Once inside 
the booths, each subject was told that he would be 
communicating with a person who had expressed a 
preference for working with him. He was then told: 


It looks like there is a good chance that you will 
end up being a compatible pair if it turns out that 
you like him, and he does not change his mind 
about you. For this reason I hope that you will 
make a special effort to gain his liking and respect, 
always remembering your position as commander 
[subordinate]. 


Control Condition, Each of the second-year ses- 
sions was presented to the subjects as an attempt to 
study how leaders and followers can get to know 
each other. The emphasis was on the importance of 
obtaining valid information in forming an impression 
and the orienting instructions concluded with the 
following reminders: 


We are interested in studying how well each of you 
can do at learning the kind of person the other is 
when there are differences in status. Therefore, 
it is especially important that each of you respond 
naturally and thoughtfully when it is your turn, 
and that you do not try to mislead the other person 
or to confuse him, He is going to want your frank 
and honest opinions in order to form an accurate 
impression of you. Keep in mind, then, the impor- 
tance of being yourself. . . . We are not especially 
concerned with whether you end up liking each 
other or not, This is not the point of the experi- 
ment. We are interested only in how well you can 
do in reaching a clear impression of the other 
person, 


The anonymity of each subject was assured. 

Except for these orienting instructions, subjects in 
the control condition were exposed to the same sub- 
sequent procedures as subjects in the ingratiation 
condition. These procedures will now be described. 


Procedures for Exchanging Information 


Once each subject was seated in his own private 
booth, it was possible to intercept all outgoing com- 
munications and to provide each subject with stand- 
ard messages. These were allegedly from the un- 
known different-status partner. Thus at no time did 
subjects actually communicate with each other, and 


6 E. E. Jones, K. J. Gercen, anp R. G. Jones 


HS and LS subjects were exposed to identical infor- 
mation from outside. 

Opinion Exchange. The first task for each subject 
was to exchange with his supposed partner a series 
of 24 opinion items. Twelve of these items, each 
appearing on a separate ballot, were to be initiated by 
the subject and 12 different items were to be initiated 
on alternate trials by his partner. Each ballot con- 
sisted of two identical sections. Each section con- 
tained an opinion statement, a 12-point rating scale 
on which the subject was to indicate his agreement 
or disagreement with the statement, and a space for 
comments. When initiating a statement, the subject 
filled in the upper section, the ballot was delivered 
to his partner, and finally, the ballot was returned 
to the subject with the opinion of the partner on the 
same issue appearing in the lower section. For items 
initiated by the subjects the partner always showed 
close agreement with the subject’s opinion, and a 
short, standard, supportive statement was added. 

Of central concern here, however, was the subject’s 
reply to the 12 opinions initiated by the partner. A 
measure of conformity was derived from the degree 
of expressed agreement on these items. In an attempt 
at greater theoretical precision, these items were of 
three types: those having to do with the Navy, with 
academic life, and those of general interest and mis- 
cellaneous content. All items were chosen from a 
pool of 36 items which had previously been adminis- 
tered to 60 Naval ROTC students in the sophomore 
class. The major criteria for item selection were that 
the mean of the sophomore distribution for the item 
was close to one extreme or the other, and that 
approximately 90% of the sophomores checked within 
5 scale points of the mean. Sample items of each 
type are: 


1. Navy: Because of their more intensive naval 
training, young officers coming out of Annapolis 
should be given positions of authority over Naval 
ROTC students. 

2. Academic: In order to allow for each individual 
to develop his own interests, there should be no re- 
quired courses in college. 

3. Miscellaneous: Television programs have become 
so bad that we should seriously consider federally 
sponsored programing during certain hours of the day. 


Three items of each type were selected as “critical” 
items, For these, the partner’s initiated opinion was 
recorded on the ballot in a position which was 
clearly toward the other end of the distribution from 
the sophomore class mean for that item. For instance, 
whereas 90% of the sophomores had strongly disa- 
greed with Annapolis graduates being given positions 
of higher authority, all subjects received a ballot on 
which their partner agreed with this statement. One 
item of each type was selected as “neutral.” On 
these ballots, the partner endorsed items in the same 
way as members of the sophomore class. Means 
were also later obtained for all items from the re- 
maining members of the freshman, junior, and senior 
classes, These means were used, as described below, 
in the analysis of the conformity data. 


Exchange of Self-Presentation Ratings. The second 
task for each subject was to exchange with his part- 
ner a series of self-ratings. The self-rating form used 
consisted of scales separating 24 pairs of antonyms. 
These antonyms had been preselected to form six a 
priori clusters with four pairs in each. The “strength 
of character” cluster, for instance, was composed of 
the following dimensions: forceful-weak, indecisive- 
confident, wishy washy-strong character, and per- 
severing-gives up easily. Other clusters included: 
attractiveness, popularity, competence, integrity, and 
control and adjustment. Each pair of antonyms 
bracketed three 12-point rating scales on the form 
provided, two of which were involved in the pres- 
entation of self-attributes. On the first of these scales 
the subjects were told to rate the items in terms of 
the way they actually saw themselves. On the 
second scale for that item a rating of the ideal self 
was to be made. Subjects were further instructed to 
check a box in the margin beside any dimensions 
which they felt denoted particularly important per- 
sonal characteristics. Instructions to the subject em- 
phasized that the self- and ideal ratings would be 
transmitted to his partner. 

Public Ratings of the Partner. These rating forms 
were then delivered to the partner who was to use the 
third scale to indicate what ke thought of the subject 
on the same dimensions. In turn, the subject re- 
ceived the partner’s self- and ideal ratings. These 
ratings were also bogus and all subjects received a 
similar set. The way in which the subjects used the 
third scale to evaluate the partner for transmission 
to him constituted the measure of other enhancement. 
While subjects were making these ratings, bogus 
ratings of the subjects were being recorded on the 
subject’s self- and ideal rating sheets. These ratings 
were also the same for all subjects and were uniformly 
toward the positive extreme of the scale. These 
rating forms, earlier initiated by the subjects and now 
containing bogus ratings of them presumably made 
by the partner, were then returned to the subjects 
for examination. It should be mentioned that 12 HS 
and 12 LS subjects in the control condition were 
instructed that these ratings of the partner would 
not be transmitted to him. All remaining subjects 
filled out their evaluations of the partner after clearly 
stated instructions that these evaluations would be 
transmitted to him. The effects of this return-no- 
return variation will be presented in the Results 
section. 

Private Ratings, Finally each subject was asked to 
make a series of private evaluations of his partner. 
These were not to be exchanged but allegedly, in the 
ingratiation condition, were to be used to make the 
preannounced crucial judgments as to the compati- 
bility of the pair. The private ratings did not have 
the same significance for subjects in the control 
condition. Included on the private rating form were a 
number of questions regarding the subject’s percep- 
tion of the partner’s sincerity, and questions dealing 
with the efficacy of the experimental manipulations. 

Once the third task had been completed, the sub- 
jects were brought together to discuss the purpose 


Tactics OF INGRATIATION 7 


of the experiment and the deceptions were revealed. 
Subjects were cautioned not to discuss the experi- 
ment with others. 


RESULTS 


The three major sources of dependent vari- 
able data were the opinion ratings transmitted 
by each subject in response to the bogus 
opinions received, the self-ratings prepared for 
communication to the partner, and the im- 
pression ratings assigned by each subject to 
his high or low status partner. From these 
data sources, an attempt was made to de- 
velop indexes for measuring the three tactics 
of conformity, self-presentation, and other en- 
hancement. The following presentation of 
results deals with the differential use of each 
of these tactics as a function of status and 
explores in addition some of their correlates. 


Opinion Conformity 


It will be recalled that each subject received 
12 bogus opinion ratings ostensibly filled out 
by his partner. He was to indicate his own 
opinion on the same ballot, to be returned to 
the initiator. On nine of these ballots, the 
bogus opinions received were highly discrep- 
ant from the nearly unanimous norms of 
sophomore ROTC students, In constructing 
an index of degree of conformity, it was 
assumed that the smaller the discrepancy be- 
tween the bogus rating received and the sub- 
ject’s responding rating, the greater the 
degree of social influence on opinion expres- 
sion. By inference from the distribution of 
responses in the normative data, it was judged 
to be extremely unlikely that subjects indicat- 
ing agreement with the bogus ratings were 
not influenced by those ratings. A convenient 
index of conformity, then, is the discrepancy 
between bogus ratings and those given in re- 
sponse to them. As in Tuddenham’s (1959) 
paradigm, a subject can conform to varying 
degrees without actually agreeing with his 
partner. It should also be noted that, unlike 
the typical attitude change study, the subjects 
had never previously expressed themselves on 
the specific items involved and therefore were 
not in anyway committed to a rating position 
on a “before” measure. 


Figure 1 graphically illustrates, and Table 
1 summarizes, the statistical analysis of the 
conformity data. Here each individual's score 
has been converted into a discrepancy from 
the appropriate class norm for that grouping 
of items. The normative data for judging the 
degree of conformity in the HS group were 
taken from those senior and junior ROTC 
students who did not participate in the first- 
year experiment, and LS norms were derived 
from nonparticipating freshmen in the same 
way. As it turned out, the freshmen and 
upper-class norms were very similar for each 
item type, so the differences between discrep- 
ancy scores of HS and LS subjects in the 
experimental groups can be evaluated without 
any complicated correction for alleged norma- 
tive differences. It might also be noted that 
the subjects in both the ingratiation and con- 
trol conditions were generally influenced to 
some extent by the bogus opinion ratings re- 
ceived. In 11 out of 12 comparisons between 
subject means and class norms, there was a 
significant amount of social influence. Only in 
the HS control condition with the Navy items 
did the mean fail to differ significantly from 
the class norm. 

It is evident from Table 1 that each of the 
experimental variables contributed a signifi- 
cant effect. The LS subjects conformed more 


TABLE 1 


OPINION CONFORMITY: SUMMARY OF ANALYSIS 
OF VARIANCE & 


Source dj MS F 
Between subjects 75 
Ingratiation versus 
control (B) 1 186.12 5.38* 
HS versus LS (C) 1 208.44 6.02* 
BXC 1 07 
Error (b) 72 34.62 
Within subjects 152 
Relevance (A) 2 73.44 5.88* 
AXB 2 32.64 2.61 
AXC 2 154.21 12.35** 
AXBXC 2 2.76 
Error (w) 144 12.49 


* For this analysis, cell frequencies were equalized by ran- 
domly Searing subjects in the larger cells. 
p < .05. 
*b < 01. 


g CONTROL INGRATIATION 2 
$ 10 ’ 
2 
ape 
o=8 poo o 
55 7 f: 
>55 / 
E396 f 
233 
a3 7 
RSEN oy 
FFE] ri 
aes / 
BSa d 
Se 
Sy 5 O=-OLS 
$! o—OHS' 
Ne — I T- —r 
Misc, Academic Navy Misc. Academic Navy 


ITEM RELEVANCE —> ITEM RELEVANCE —> 


Fic. 1, Conformity as a function of issue relevance. 


than the HS subjects; all subjects conformed 
more on the average in the ingratiation con- 
dition than in the control condition; and, 
there was a general tendency to conform less 
on academic items than on either Navy or 
miscellaneous items. 

Interpretation of these main effects, and 
especially the effect of relevance, must await 
consideration of the highly significant inter- 
action between status and relevance. In both 
the ingratiation and control conditions, LS 
subjects conformed more than HS subjects 
only on the items most relevant to the basis 
for the hierarchy. This tendency was espe- 
cially clear (see Figure 1) in the ingratiation 
condition, where increasing relevance leads to 
more conformity in LS subjects and less con- 
formity in HS subjects. However, when 
separate analyses are performed the interac- 
tion between status and relevance is highly 
significant in both the ingratiation condition 
(F=6.87, df=2/144, p < .01) and the 
control condition (F=5.39, df=2/144, 
p< 01). 

It would appear that a rather general 
tendency exists for the high and low status 
persons to show differential conformity to the 
extent that the issues involved are relevant 
to the basis of the hierarchy, and that this 
tendency is not markedly affected by varia- 
tions in the importance of being liked. What 
is affected by the arousal of ingratiation mo- 
tives, however, is the overall level of conform- 
ity behavior manifested. Both HS- and LS 
subjects conform more under pressures to be 
ingratiating than when specifically cautioned 
to express their true views. This is not par- 
ticularly surprising, perhaps, but it does help 


E. E, Jones, K. J. Gercen, AND R. G. Jones 


to validate the manipulation conveyed by the 
two sets of orienting instructions. 


Self-Presentation 


A hypothesis suggested by Blau (1960) was 
presented in the introduction. The present 
version of the Blau hypothesis holds, in effect, 
that the high status person is more likely than 
the low status person to advertise his positive 
attributes in important areas and to deprecate 
himself with respect to less important traits. 
By implication from Blau’s argument that 
this is the leader’s way of demonstrating 
approachability while maintaining the fol- 
lowers’ respect, this tendency should be espe- 
cially pronounced in the ingratiation condition. 

In the present experiment, the importance 
of an attribute was determined by each sub- 
ject for himself. After the exchange of opin- 
ion items was completed, each subject was 
instructed to rate his actual and ideal selves 
on the 24-item rating scale described in the 
procedure section, and to check in the margin 
those traits which he considered “especially 
important personal characteristics.” The aver- 
age subject checked about one of every three 
items as important, and there were only small 
and clearly nonsignificant differences between 
the mean number of items checked in each 
treatment combination or cell. 

In analyzing the data to test the Blau 
hypothesis, two separate scores were derived 
for each subject: the average “actual” rating 
assigned to important and unimportant items, 
respectively. Since each of the 24 scale items 
consisted of one highly favorable and one 
unfavorable antonym, this average rating was 
assumed to reflect the positivity of self- 
description on attributes at two levels of 
importance. These pairs of scores were placed 
for analysis in a mixed factorial design, with 
two between-subjects effects (status and con- 
dition) and a within-subjects effect (impor- 
tance). The means for each cell are portrayed 
in Figure 2, and the results of the variance 
analysis are shown in Table 2. 

Within the ingratiation condition, the Blau 
hypothesis seems nicely confirmed. There is 
no main effect of status or importance, but the 
predicted interaction is significant (F=4.19, 
df=1/64, p < .05). Thus HS subjects in 


Tactics OF INGRATIATION 9 


9.57] CONTROL INGRATIATION 


o 
i>} 


FAVORABILITY —> 
w 
a 


œ 
o 


T 
Lo HI Lo HI 
ITEM IMPORTANCE 


Fic. 2. Favorability of self-presentation on items 
varying in importance. 


the ingratiation condition did describe them- 
selves more favorably on important than on 
unimportant traits, and there was no such 
tendency for LS subjects, 

When the results for the control condition 
are also considered, however, and when the 
full analysis is examined (Table 2), matters 
become more complicated. Here we see that 
there is an overall main effect of the within- 
subjects’ variable, importance. The attributes 
which are designated as important by the 
subject tend to receive more favorable ratings. 
As Figure 2 shows, the one exception to this 
general effect occurs in LS-ingratiation cell. 
This exception is marked enough to produce 
a significant second-order interaction between 
status, condition, and importance. As a func- 


TABLE 2 
SELF-PRESENTATION: SUMMARY OF ANALYSIS 
or VARIANCE 
Source df MS F 
Between subjects 67 
Ingratiation versus 
control (B) 1 2.73 
HS versus LS (C) 1 40 
BXC 1 212 
Error (b) 64 2.44 
Within subjects 68 
Importance (A) 1 9.23 17,03** 
AXB 1.44 
AXC 1 06 
AXBXC 1 3.60 6.64* 
Error (w) 64 54 
*p <05, 


tion of increasing the pressure to be liked, 
HS subjects became generally more modest 
in presenting themselves. In keeping with 
the Blau hypothesis, this tendency was some- 
what greater for the unimportant than for the 
important items. In contrast, when ingratia- 
tion pressures were applied to LS subjects, 
they became slightly more modest about the 
important attributes and considerably more 
favorable in presenting their unimportant 
traits. As a result, when one considers only 
the roughly 16 items not checked as impor- 
tant, the highs became significantly more 
modest when trying harder to be liked 
(p < .02) while the lows became less so 
(p < .06). 

Some questions may certainly be raised 
concerning the degree of fit between the im- 
portant-unimportant dichotomy included in 
the analysis of self-presentation data and the 
conceptual distinction suggested by Blau’s hy- 
pothesis, As far as HS subjects are concerned, 
the degree of fit depends on whether or not 
checking a trait as important is equivalent to 
saying “these traits are relevant to my claims 
for respect.” An examination of the content 
of items checked as important shows some 
tendency for HS subjects to emphasize 
strength of character, dependability, and com- 
petence; while LS subjects seem more con- 
cerned with friendliness, warmth, and popu- 
larity. While the differences between HS and 
LS subjects in this regard were not significant, 
the concern of HS group with task-relevant 
dimensions suggests that the important- 
unimportant distinction does relate to Blau’s 
discussion of the leader’s self-deprecation on 
nonsalient characteristics. 


Other Enhancement 


One of the most obvious and ubiquitous 
tactics of ingratiation involves the expression 
of compliments or the communication of flat- 
tering appraisals. In the present experiment, 
this kind of enhancement of the other was 
possible in the final exchange of information, 
when each subject communicated his evalua- 
tions of his partner presumably to that 
partner. The notion of flattery or other 
enhancement seems to imply that one person 
overevaluates another. But the question im- 


10 E. E. Jones, K. J. GERGEN, anD R. G. JONES 


mediately becomes, overevaluation with re- 
spect to what standard or baseline? 

Since there is no available way of determin- 
ing what each subject really thought of his 
partner in the present experiment, (even the 
“private ratings” will undoubtedly be affected 
by commitment to the prior public ratings), 
the evidence regarding other enhancement is 
necessarily circumstantial. Ratings made by 
subjects at one status level in the ingratiation 
condition may be compared with ratings made 
by other subjects at the same status level in 
the control condition. It will also be recalled 
that some of the subjects in the control con- 
dition were informed that their ratings of 
their partner would be transmitted to him, 
while others were assured that their ratings 
would be seen only by the experimenter. 
Presumably this should have given us some 
further leverage in the attempt to tease out 
the relevant variables governing the favor- 
ability of ratings describing the partner. 

As with the self-presentation data, the other 
ratings were converted into favorability scores, 
obtained by summing the scale scores for 
all items. Table 3 presents the results of 
such a summation, indicating the means and 
standard deviations for both status levels in 
the ingratiation condition, in the control con- 
dition, and in the return and no-return 
subconditions. Turning first to HS subjects, 
the pattern of means suggests that the in- 
gratiation instructions themselves were not a 
crucial determinant of the favorability ratings, 
but that assurances made to the subjects that 
the ratings would not be seen by their partners 


resulted in a reduction in rating favorability. 
The difference between the ingratiation condi- 
tion and the control-no-return condition is 
very close to significance (t=2.01, p < .06). 
The pattern of means for LS subjects is 
more complicated. Relative to HS subjects, 
LS subjects in the ingratiation condition were 
clearly more favorable in their ratings (¢= 
3.06, p < .01). Turning to the two control 
subconditions, LS subjects were significantly 
less favorable in the no-return treatment 
(t=2.69, p < .05) but even less favorable 
in the return treatment (¢=4.31, p < .001), 
when both means are compared with LS 
ingratiation condition mean. Since this very 
low mean for subjects in LS control-return 
cell seemed to have no ready explanation, the 
conformity data for all control subjects were 
examined. By an accident of assignment to 
the subconditions, LS return subjects hap- 
pened to conform significantly less than LS 
no-return subjects (¢=2.0572, p < .05). 
There was no difference in the conformity 
scores of HS return and HS no-return subjects. 
Apparently, since the procedures were identi- 
cal in the two control subconditions up to the 
point of making the ratings of the partner, 
subjects in LS return subcondition happened 
to find themselves in greater disagreement 
with the incoming opinion statements than 
those in LS no-return subcondition. This 
provides at least one reason why they would 
subsequently show a more negative reaction 
to the alleged initiator of these opinions. 
The fact that assignment of LS subjects to 
return and no-return subconditions was biased 


TABLE 3 


EFFECTS OF STATUS ON FAVORABILITY RATINGS OF PARTNER IN THREE CONDITIONS 
(Means,* standard deviations, tests of HS versus LS differences) 


Group 
HS LS 

Condition M SD N M SD N tarte. 

Ingratiation 203.84 14.10 19 218.86 16.61 21 3.06* 
à „Control 

P Total 194.37 24.56 19 198.25 14.88 20 -60 

~ Return 200.86 26.72 7 189.75 15.15 8 99 

No-return 190.58 22.56 12 203.92 12.19 12 1.78 


à The larger the mean, the more favorable the summed ratings. 
*p< Ol. 


| 


TACTICS OF INGRATIATION 11 


with respect to opinion conformity scores 
obviously complicates the interpretation of 
differences in indexes of other enhancement. 
Perhaps the most reasonable solution to the 
problem thus posed is to ignore the subcon- 
dition variations and to make a comparison 
only between the (combined) control con- 
dition and the ingratiation condition, for 
HS and LS subjects. An analysis of variance 
of the favorability means for these four basic 
conditions resulted in two main effects. Sub- 
jects in the ingratiation condition were 
significantly more favorable in rating their 
partners than subjects in the control condi- 
tion (F=13.85, df=1/75, p < .001). 
Also, LS subjects expressed greater admira- 
tion of HS subjects than vice versa (F= 5.47, 
dj=1/75, p < .05). The interaction be- 
tween status and condition was not significant 
(F=1.90, df=1/75). 

In the introductory section it was reasoned 
that low status persons are in a poor position 
to use other enhancement as an ingratiation 
tactic. The present results provide no evi- 
dence to confirm this reasoning, but it should 
be stressed again that it is very difficult to 
derive an index which would be a reasonable 
measure of flattery or other enhancement in 
this setting. The LS subjects were more 
favorable in their ratings than HS subjects in 
both the control and the ingratiation condi- 
tions, though the difference was significant 
only in the latter case. Some portion of the 
variance, then, seems attributable to the 
likely fact that the stereotyped impressions 
which most upper classmen have of most 
freshmen are less favorable than the fresh- 
men’s stereotype of them. Both LS and HS 
subjects became more favorable under instruc- 
tions to be ingratiating. Though LS subjects 
showed a greater increase than the highs, the 
difference between these differences (similar 
to the interaction reported above) was not 
significant (£=1.38). 

Finally, it might be noted in passing that 
to some extent the problem of interpreting 
rating differences between HS and LS sub- 
jects was anticipated. It will be recalled that 
the subjects rated their partners on the same 
sheet with the partners’ alleged (actual and 
ideal) self-ratings. Bogus self-ratings were 


prepared so that there were several items on 
which the partner indicated a large discrep- 
ancy between his actual and ideal self. Many 
of these items were also checked as being 
important to the partner. The intent was to 
provide cues for the elicitation of flattery in 
the ingratiation condition by indicating areas 
in which the other person was dissatisfied with 
himself. If the rating scale was being used 
in the service of ingratiation needs, then HS 
and LS subjects should have differed espe- 
cially on these items in the ingratiation condi- 
tion. Separate analysis of high and low 
discrepancy items revealed that, if anything, 
differences in the ingratiation condition were 
greater for the low-discrepancy items. In 
the control condition the average favorability 
assigned to high-discrepancy items was almost 
identical to that assigned to low-discrepancy 
items within each status level. 


Interrelationships among the Dependent 
Variables 


Relationships among the various potential 
measures of ingratiation tactics obviously 
deserved exploration, though there were no 
strong expectations about what such individ- 
ual difference analyses would yield. The two 
experimental conditions investigated in the 
present study may be viewed as posing for 
the subjects a problem of determining the 
most appropriate social response under the 
constraints inherent in presenting information 
about the self, It might be said that the 
ingratiation instructions made salient the 
social implications of the subjects’ behavior, 
while the control situation emphasized the 
existence of reality constraints, Presumably 
there are alternative ways of responding to 
the problem posed by the instructions in each 
condition. The examination of individual dif- 
ferences might reveal either of two general 
patterns of correlation among the dependent 
variable measures. Especially in the ingratia- 
tion condition, the different communication 
tasks in the experimental sequence might be 
viewed as alternative ways of accomplishing 
the goal of increasing one’s attractiveness to 
his partner, In that case, we would expect 
to find inverse correlations between the de- 
pendent variables reflecting a kind of either- 


12 E. E. Jones, K. J. Gercen, anD R. G. Jones 


or solution of the problem. While such a 
correlational pattern was conceivable, it was 
considered more likely that subjects would 
systematically vary in their level of response 
to the instructions, rather than in their pref- 
erence for particular tactics to the exclusion 
of others. This would result in positive corre- 
lations between responses to the different 
communication tasks, indicating that subjects 
who adopt one kind of tactic are also more 
prone to adopt the other kinds as well. 
Relations between Conformity and the 
Impression Ratings. Did the subjects who 
conformed more to their partner subsequently 
rate him more favorably? How was the re- 
lationship between these two response clusters 
affected by differences in instructions and in 
status level? Table 4 presents the product- 
moment correlations which are relevant for 
attempting to answer these questions, While 
only 2 of the 12 correlations are actually 
significant, there are interesting and consist- 
ent differences in their magnitude as a joint 
function of status and the relevance of the 
opinion to the status hierarchy. Specifically, 
HS subjects who conformed on the more rele- 
vant items also transmitted more favorable 
ratings to the partner. The pattern for LS 
subjects seems to be almost a mirror image 
of HS pattern. Here those who tended to 
conform more on the less relevant items were 
the ones most likely to transmit favorable 
ratings to their partners. It is interesting to 
note that the variation in instructions to be 
ingratiating had very little effect either on 


TABLE 4 


CORRELATIONS BETWEEN CONFORMITY AT DIFFERENT 
LEVELS OF RELEVANCE AND FAVORABILITY OF 
RATINGS OF PARTNERS 


Item 
Group Miscellaneous Academic Navy 
Control 
HS 031 373 424 
LS .559**  —.130 -120 
Ingratiation 
HS .009 339 452* 
LS .383 211 — 159 


the overall magnitude or the patterning of 
the correlations. ; 

Any attempt to explain the data of Table 
4 would clearly be post hoc. Nevertheless, the 
pattern of correlations does suggest that con- 
formity and rating favorability are correlated 
only when the average amount of conformity 
is relatively low (see Figure 1). It does not 
seem too surprising that attraction and con- 
formity are unrelated when there is fairly 
strong situational pressure to conform. This 
would be a typical instance of stimulus con- 
straints washing out systematic individual 
differences. When such pressures or con- 
straints are reduced, on the other hand, con- 
formity and attraction tend to be more 
strongly related. This might simply be a 
function of the fact that some subjects were 
in closer agreement with the message initi- 
ator’s expressed opinions (because of actual 
opinion variation in the sample) and that 
these “conformers” naturally liked the initi- 
ator better because his views were more 
similar to their own. Or, at least some part 
of the effect might be a function of disso- 
nance reduction (Festinger, 1957) or balance 
restoration (Heider, 1958). That is, having 
conformed to the initiator in the absence of 
strong situational pressures to do so, the sub- 
ject must find some justification for his com- 
pliant behavior. By expressing relative ad- 
miration or liking for the partner, regardless 
of whether this positive impression is to be 
transmitted, the subject may create a justifi- 
cation for conformity when other reasons are 
not apparent. In more technical terms, the 
individual thus reduces the dissonance or im- 
balance created by conforming when the 
social conditions do not require close agree- 
ment, 

Relations between Conformity and Self- 
Presentation. The preceding explanation 
rather casually introduced the assumption 
that high status persons may demonstrate 
their approachability by conformity on irrele- 
vant items while maintaining their true opin- 
ions on items more relevant to the respect 
they wish to receive as leaders. In the intro- 
duction, however, approachability was explic- 
itly linked to the tactic of describing oneself 
more favorably on important than on unim- 


Tactics OF INGRATIATION 13 


portant items. Differential conformity and 
differential self-presentation both seem to be 
plausible means by which the high status 
person can demonstrate approachability while 
eliciting respect. What is the relationship 
between these two behavior patterns? 

Table 5 presents the correlations between 
the respective discrepancy scores. The con- 
formity discrepancy score was determined by 
subtracting each individual’s score for con- 
formity on Navy items from his combined 
conformity score on Academic and Miscel- 
laneous items. The self-presentation discrep- 
ancy score was obtained by subtracting each 
individual’s mean favorability score for Unim- 
portant items from his mean favorability on 
Important items. The tabled results show 
that there is a significant relationship between 
the two discrepancy scores for HS subjects in 
each condition, From this finding we may 
infer that HS subjects who are especially 
concerned with the balance between main- 
taining respect and demonstrating approach- 
ability will show this concern both in respond- 
ing to others’ opinions and in presenting their 
own characteristics to others. These individual 
differences in concern are apparent in both 
ingratiation and control conditions, so we 
are apparently dealing with a general style 
of representing oneself as a high status person, 
rather than a disposition to be more or less 
responsive to the arousal of ingratiation 
motives. 

Relations between Self-Presentation and 
Ratings of Other. There were no clear expec- 
tations about relations between the final two 


TABLE 5 


CORRELATIONS BETWEEN CONFORMITY DISCREPANCY * 
AND SELF-PRESENTATION DISCREPANCY? BY 
EXPERIMENTAL CONDITION 


Condition 
Group Control Ingratiation 
HS .595* 421* 
LS .118 .210 


a Determined by pubbracting the degree of conformity on 
Navy items from the combined conformity scores on Academic 
and Miscellaneous items. E, n 

ò Determined by subtracting the mean favorability on Unim- 
portant traits from the mean favorability on Important traits. 


p< 05 


dependent variable measures. If some portion 
of the variance in both sets of ratings were 
contributed by a response set to respond 
favorably, then one would predict a positive 
correlation between self- and other ratings. 
On the other hand, at least the possibility 
existed that those who presented themselves 
in an unfavorable light would tend to enhance 
the partner by a form of contrast effect. 
Actually, none of the correlations between 
favorability of self-presentation and favora- 
bility of other ratings approached significance. 
Since the most interesting self-presentation 
results occurred in connection with the dis- 
crepancy between ratings of important and 
unimportant items, these discrepancy scores 
were also correlated with the other rating 
favorability scores. These correlations ranged 
from —.14 to .13 in the various treatment 
conditions, values which obviously are well 
within the limits of chance variation, 


Perceptions of Flattery 


It should again be emphasized that all 
subjects, regardless of status, received iden- 
tical information about their alleged partners 
up to the point at which the impression rat- 
ings were to be exchanged. Beyond this 
point, all but the subjects in the no-return 
subcondition of the control treatment contin- 
ued to receive identical information—i.e., 
each subject received his own self-ratings 
back presumably after these had been shown 
to the partner and the partner had indicated 
his own ratings of them, The no-return 
subjects were assured that their ratings of 
the partner would not be transmitted to them, 
and in line with these instructions they re- 
ceived no ratings from the partner. The re- 
maining subjects were exposed to highly 
favorable bogus ratings of their personal at- 
tributes on the 24-item scale. It is relevant 
to the theoretical purposes of the experiment 
to inquire whether HS subjects in the ingra- 
tiation condition attributed more flattering 
intentions to LS subjects than vice versa. 
We might expect this to be the case since 
we have argued above that' there are con- 
straints that operate to inhibit upward flattery 
in a status hierarchy. When the leader re- 
ceives a highly positive evaluation, therefore, 


14 


he is likely to suspect the intentions of the 
follower and to be alerted to the likelihood 
that deceit and flattery are involved. Such a 
possibility is less likely to occur to the low 
status recipient of a highly positive evalua- 
tion, or to subjects at either status level in 
the control condition. 

Three scale items relating to the perception 
of flattery were embedded in the final post- 
experimental questionnaire. Ratings on these 
three items were combined to form a single 
measure of perceived flattery for each subject, 
and these measures were then compared across 
conditions, The results of this comparison are 
presented in Table 6. It is apparent that in 
the ingratiation condition HS subjects do 
indeed attribute more flattering intentions to 
LS subjects than vice versa. This difference 
washes out completely in the control condi- 
tions. When ratings in the control-return 
subcondition are compared with ratings in 
the ingratiation condition, the difference be- 
tween the differences only approaches signifi- 
cance, It is important to note, however, that 
the means of the two control subconditions 
were almost identical. When subjects are 
operating under the control instructions, in 
other words, it seems to make little difference 
whether or not they are exposed to very 
favorable feedback from their partner. They 
do not apparently use this information to 
make inferences about flattering intentions. 
When all control subjects are compared with 
subjects in the ingratiation condition, the 
difference between the status differences does 
reach significance (p < .05). 


E. E. Jones, K. J. GERGEN, AnD R. G. Jones 


The results on the perception of flattery 
make very good sense, then, for they illustrate 
the dangers of attempts at ingratiation when 
the actor is in a position of low power or 
status, and when instructions have emphasized 
each person’s stake in being attractive to the 
other. While compliments moving downward 
in a hierarchy tend not to raise questions 
about sincerity and frankness, the same com- 
ments moving upward smack of flattery and 
deceit. This result is quite consistent with 
the results of an experiment by Jones, Jones, 
and Gergen (1963) which was concerned with 
the perceptions of a conformist operating 
under different social conditions. In that 
experiment bystanders rated a persistently 
agreeable person much less favorably when 
it was clear that he had a high stake in ap- 
pearing attractive to another person than 
when no such incentives were apparent. The 
two experimental conditions closely resembled 
the ingratiation and control conditions of the 
present study. 


DISCUSSION 


The results give abundant testimony to the 
importance of the status variable in determin- 
ing the content of self-reflecting communi- 
cations in a well-defined hierarchy. The major 
question which arises in attempting to inter- 
pret the various differences in behavior asso- 
ciated with status, is whether or not these 
differences are systematically affected by in- 
gratiation pressures. The present experiment 
was conceived as an attempt to show that 


TABLE 6 


PERCEPTION OF FLATTERY 
(Mean ® postexperimental ratings in each condition and differences between them) 


Group 
HS Ls 
Condition b M SD N M SD N baitt 
Ingratiation 13.05 4.14 19 9.62 3.16 21 -01 
Control 
Total 11.68 4.46 19 11.85 3.41 20 ns 
Return 11.71 4.26 19 12.25 4.58 8 ns 


a These means are based on the following items: completely sincere—on the phony side, trustworthy—unreliable, brutally 
frank—flatterer. The higher the mean score, the greater the perceived TARA 


7 Coapa ons across conditions—Ingratiation (HS-LS) versus control 
.05. Ingratiation (HS-LS) versus control-return (HS-LS): 


2.101, $ < 


(HS-LS): — #z«(13.05 — 9.62) — (11.68 — 11.85) = 
Pee 05 — 9.62) — (11.71 — 12.25) = 1.690, p <.10. 


Tactics OF INGRATIATION 15 


high and low status persons adopt different 
tactics of presenting themselves when each is 
concerned with creating an attractive im- 
pression for the other’s consumption. The 
importance of this concern was varied by 
exposing subjects in successive years to two 
quite different sets of instructions. 

There is no question that the instructions 
had an effect on the dependent variables of 
the experiment. Under the ingratiation in- 
structions, all subjects conformed more and 
their ratings of the partner were generally 
more favorable. It is not quite clear, how- 
ever, that the ingratiation instructions gave 
rise to different interpersonal tactics as a 
function of status. The following discussion 
attempts to assess the relative contribution of 
aroused motives to be ingratiating to the 
observed differences between high and low 
status subjects. 


Conformity, Ingratiation, and Status 


The most novel and intriguing feature of 
the conformity data would seem to be the 
variations captured by the statistical inter- 
action between status and issue relevance. 
Especially in the ingratiation condition, there 
was a clear tendency for HS subjects to con- 
form less on relevant than on irrelevant issues. 
As noted above, this finding fits nicely with 
the notion that leaders must maintain the 
respect of the followers in order to be effec- 
tive, but they must also find some means of 
demonstrating their approachability. It is per- 
haps not too surprising that HS subjects were 
able to resist influence on the Navy items, but 
what is rather remarkable is the extent to 
which they conformed on the less relevant, 
Miscellaneous issues. On these issues, the 
message initiator took a stand which was, on 
the average, six points removed from the 
class norm. In response to the influence pres- 
sure implicit in this discrepant stand, the 
average HS subject responded with opinions 
approximately midway between the class 
norm and the initiator’s position. It would be 
wrong to suggest, then, that HS subjects 
moved to a position of complete agreement 
on these items, but a substantial amount of 
influence nevertheless took place. One might 
say that the average HS subject moved close 


enough to the low’s alleged position that he 
was in general, but not in complete, agree- 
ment. 

It seems clear, furthermore, that the extent 
of social influence on the miscellaneous items 
was definitely a function of the ingratiation 
instructions. The highs showed significantly 
less conformity on these issues under control 
than under ingratiation instructions (F= 
6.69, df=1/72, p < .03). Because of the 
amount of conformity in the ingratiation con- 
dition, and the significant reduction in the 
control condition, we may conclude that 
agreement on issues irrelevant to the status 
hierarchy serves for HS subjects as a means 
of increasing their attractiveness, i.e., their 
approachability. The fact that HS subjects 
also show slightly less conformity on the Navy 
items in the control condition (F=2.29, df= 
1/72, p=ns) does not seriously qualify this 
conclusion, though it suggests that the leader’s 
attempt to increase approachability may even 
involve some conformity on issues related to 
the hierarchy. 

In marked contrast to HS subjects, LS 
subjects in the ingratiation condition showed 
greatest conformity on those issues most rele- 
vant to the hierarchy. Undoubtedly, some of 
this differential conformity on high relevance 
items may be attributed to the direct or in- 
formational effects of being exposed to an 
“expert’s” opinions. This would assume that 
HS subjects, relative to the lows, were per- 
ceived to be more knowledgeable about life in 
the Navy than about such an issue as the 
contribution of comic books and crime movies 
to the rise in juvenile delinquency (one of 
the Miscellaneous items). This seems a 
plausible assumption. In addition to the di- 
rect effects of expertise on the differential 
conformity of LS subjects, however, ingra- 
tiation instructions also make a contribution. 
Thus LS subjects conform significantly more 
under ingratiation instructions on both Navy 
issues (F=4.08, df=1/72, p < .05) and 
Miscellaneous issues (F=6.85, df=1/72, 
p < 01). ‘ 

In dealing with the joint effects of relevance 
and ingratiation pressures on conformity, it is 
quite pertinent to consider an important con- 
ceptual distinction which has been introduced, 


16 E. E. Jones, K. J. Gercen, anD R. G. Jones 


and periodically reintroduced, into theoreti- 
cal analyses of social influence processes. 
This is the distinction between normative and 
informational pressures (Deutsch & Gerard, 
1955), or between the related concepts of 
direct and reflected comparison (Gerard, 
1961). Direct social comparison (correspond- 
ing to informational influence) involves reli- 
ance upon another as an impersonal mediator 
of certain facts about reality. Reflected social 
comparison (corresponding to normative 
social influence) occurs when an individual js 
in a position to be rewarded for compliance or 
punished for noncompliance; the individual 
conforms in order to achieve some interper- 
sonal goal (such as praise or acceptance) 
rather than solely to have his opinions coin- 
cide with what now appears to be a correct 
view of reality. 

The ingratiation instructions in the present 
experiment presumably increase the amount 
of influence pressure based on the reflected 
component without changing the contribu- 
tion of the direct comparison component. We 
may conclude that the significant main ef- 
fect of the ingratiation instructions on the 
general level of conformity behavior reflects 
this increment stemming from reflected com- 
parison. On the other hand, the pattern of 
interaction between issue relevance and status 
does not shift significantly as a function of 
ingratiation versus control instructions. While 
status-related changes as a function of rele- 
vance are monotonic only under ingratiation 
instructions, the most conservative conclu- 
sion is that direct comparison largely deter- 
mines the greater differential conformity on 
the relevant Navy items. Thus while reflected 
comparison seems to have much to do with 
the general level of conformity behavior in 
both status groups, variations as a function of 
issue relevance are more likely determined by 
the subjects’ judgment of the validity of the 
information received. Whether or not these 
specific conclusions are supported by addi- 
tional research, the significant simple inter- 
action between status and relevance is a 
provocative finding, one which is of interest 
both for practical and theoretical reasons. 


Status Differences, Ingratiation Pressures, and 
Self-Presentation 


In response to a suggestion derived from a 
discussion by Blau (1960), the self-presenta- 
tion data were analyzed separately for items 
checked as personally important attributes 
and those checked as unimportant. The im- 
portance dimension proved to be crucial in 
understanding status-related responses to in- 
gratiation pressures. While HS subjects 
always presented themselves more favorably 
on important than on unimportant attributes, 
they became significantly more modest on 
both kinds of attributes when under instruc- 
tions to be ingratiating. As the argument was 
developed in the introduction, the greater 
favorability of self-presentation on important 
versus unimportant traits seems quite consist- 
ent with the presumed interest of the leader 
in emphasizing certain strong points to gain 
respect and certain weaknesses to increase 
approachability. This tendency was slightly 
(and not significantly) greater after ingratia- 
tion than after control instructions. The 
general tendency to become more self-depre- 
cating on both kinds of items is clearly the 
more striking feature of the data for HS 
subjects in the ingratiation condition. This 
is certainly compatible with the notion of a 
greater concern with approachability in this 
condition, though it is not clear why the highs 
show a slight tendency to deprecate them- 
selves on important traits as well as the unim- 
portant ones. 

The general tendency to become more 
modest under ingratiation instructions as- 
sumes greater theoretical importance when 
compared, not only with LS subjects of the 
present experiment, but with the data on 
female subjects from an experiment by Jones, 
Gergen, and Davis (1962). In this latter 
experiment, some subjects were given instruc- 
tions to win the affection of a graduate 
student interviewer by tailoring their self- 
descriptions along lines which they felt he 
would admire—even if they felt stretching 
the truth was required—while other subjects 
were instructed to be completely candid and 
accurate in their self-presentations, Girls in 
the so-called “hypocrisy” condition were 
significantly more positive in their descriptions 


Tactics OF INGRATIATION 17 


than girls in the “accuracy” condition. It is 
quite possible, of course, that there are im- 
portant sex-related differences in responding 
to ingratiation instructions. However, it 
seems pertinent to note that the girls in the 
previous experiment were in an essentially 
lower status position vis-à-vis the graduate 
interviewers. Since most of them were college 
sophomores or juniors, they were naturally 
younger and less advanced educationally than 
their interviewers. Therefore, it is probably 
not stretching coincidence too much to note 
the parallel between the girls in the earlier 
experiment by Jones, Gergen, and Davis 
(1962) and the lower status freshmen in the 
present study. 

The current LS subjects did not show any 
general tendency to become more self-enhanc- 
ing as a function of the ingratiation instruc- 
tions. They were similar to HS subjects in 
rating themselves less favorably on the im- 
portant items in the ingratiation versus the 
control condition. However, in striking de- 
parture from the high status “modesty” effect, 
and in line with the girls’ reactions in the pre- 
vious experiment, LS subjects’ ratings of unim- 
portant attributes were significantly more 
positive in the ingratiation than in the control 
condition. A possible explanation for the rat- 
ings in this condition involves assuming that a 
certain amount of defensiveness character- 
izes the behavior of a low status person when 
it is important that he be liked or accepted 
by a high status person. We may expand on 
Blau’s (1960) reasoning to argue that a low 
status person cannot advertise his weaknesses 
without endangering his reputation as a 
valuable team member, and to suggest in 
general that a person in a weak position who 
further emphasizes his personal failings 
arouses a certain embarrassment in others 
and thereby makes himself less attractive. 

This might explain what happens on the 
unimportant items, but what of the reverse 
trend on the important items? Why did LS 
subjects become more self-effacing on these 
items when under pressure to make themselves 
more attractive? The answer may lie in the 
fairly subtle understanding, even by freshman 
undergraduates, of the dynamics of leader- 
follower relations. The low status person who 
describes himself very favorably on important 


personal attributes may be viewed as a pre- 
sumptive upstart, one who may annoy the 
leader by usurping some of the characteristics 
of his role. It may be, then, that LS subjects 
lowered their self-evaluations on important 
items because they wanted to avoid the ap- 
pearance of claiming leadership qualities and 
of thereby threatening the leader’s authority. 
The result is, in the ingratiation condition, 
an equalization of favorability across different 
sorts of items: relative to their self-presenta- 
tions under control conditions, LS subjects 
were more modest in rating their important 
characteristics and more immodest in rating 
their less important ones. 


Ingratiation Pressures and Other Enhance- 
ment 


We have already indicated some of the 
difficulties in developing a valid index of 
flattery or other enhancement. However, the 
circumstantial evidence suggests that the high 
status subjects are somewhat more positive 
in their transmitted ratings than they really 
feel, less because of the significance of their 
favorable ratings as an ingratiation tactic 
than because of a wish to avoid hurting the 
feelings of their LS partners. This seems to 
be the most reasonable interpretation of the 
pattern of favorability means, a pattern which 
showed no difference between control and 
ingratiation instructions except for the less 
favorable ratings in the control-no-return 
subcondition. 

As far as LS subjects are concerned, there 
was a rather striking general increase in 
favorability from control to ingratiation con- 
ditions, While it is difficult to know what 
proportion of this difference is actually a 
function of tactical considerations, it is quite 
clear that our expectations about the inhibi- 
tion of overt flattery were not supported by 
these rating data. Apparently, many LS sub- 
jects felt that they could increase their at- 
tractiveness by expressing more favorable 
judgments about HS person than he had 
expressed about himself. We can establish 
some contact with the original hunch that 
this would not occur, by citing some evidence 
suggesting that it might at least have been a 
tactical error. It will be recalled that, when 


18 E. E. Jones, K. J. GERGEN, ano R. G. Jones 


all subjects were exposed to very favorable 
ratings coming from their partners, HS sub- 
jects in the ingratiation condition perceived 
the sender to be more flattering and deceitful 
than subjects in any other condition. While 
the actual ratings made by LS subjects were 
not as positive as these bogus ratings, it 
would appear that they were gambling rather 
recklessly by attempting to curry favor 
through other enhancement. They might have 
hurt their cause by this self-evident gesture. 
In any event, it seems worth pursuing the 
notion that subordinates cannot resist expres- 
sing complimentary evaluations to their 
leader, even though this is not an effective 
tactic. 


Apologia 


As perhaps has been all too clear, the 
present study has attempted to find sugges- 
tive answers to a great many questions 
through an experimental design which had 
obvious weaknesses. Almost without excep- 
tion, the interpretation of each finding could 
have been more securely established if a cer- 
tain additional control group had been run. In 
one case, it might have clarified things to com- 
pare HS subjects’ responses to those of seniors 
communicating with other seniors. In another 
case, perhaps a control group of non-ROTC 
volunteers would have shed valuable light. 

The variable of status does not have a clear 
psychological meaning, and in the present 
experiment the status distinction involves a 
compound of differences in age, academic 
seniority, intellectual and social sophistica- 
tion, specific training and experience in Navy 
ways, greater independence of the home en- 
vironment, etc. In addition, these “natural” 
differences were further buttressed by the 
experimenter’s assignment of each subject to 
appropriate leader-follower roles. In keeping 
with the exploratory nature of the present 
study, such a cluster of mutually supporting 
differences made salient to LS and HS subjects 
the contrast between their psychological posi- 
tions. For greater precision of understanding, 
however, it is clear that the components of 
status need to be specified in terms which 
gear more readily into psychological analysis. 
wave do not know at present, for example, 


whether the age-correlated differences are 
more critical determinants of the results re- 
ported than the manipulated differences in 
assigned power or authority. 

The conformity results would have been 
more compelling if a more precise and repre- 
sentative sampling of the relevance dimension 
were possible. At this point, we can only say 
that the relevance dimension acted pretty 
much as it should have, and affirm our con- 
viction that the interaction between relevance 
and status is a finding of considerable impor- 
tance and one which clearly deserves replica- 
tion, extension, or qualification. If some way 
could be found to vary relevance without at 
the same time varying experience with the 
issue, this would represent an advance over 
the present design. 

Further problems of interpretation arise 
because of the fact that the dependent vari- 
able measures were obtained in a standard 
sequence. We are not in a position to judge, 
for example, whether the subjects would have 
presented themselves in the same way if this 
task had not been preceded by one in which 
they exchanged opinions with a person who 
initiated rather extreme views and yet tended 
to agree with their own opinions. We have 
attempted to shed some indirect light on the 
problem by presenting the correlations be- 
tween the various dependent variable mea- 
sures, but this cannot serve as a substitute 
for further experimentation either involving 
the component tasks in isolation or in a 
different sequence. 

One may also raise questions about the 
particular means chosen to induce social in- 
fluence on opinion issues. What is it like to 
exchange views with someone who is quite 
idiosyncratic when initiating opinions and 
yet agrees quite closely with all opinions re- 
ceived? We may only contend that the items 
were deliberately chosen so as to be circum- 
scribed and, hopefully, unrelated to broader, 
attraction determining attitudes. It is also 
true, of course, that all subjects were exposed 
to the same stimulus variations. 

In short, the planning of the present TE 
involved a definite decision to maximize the 
likelihood of discovery, rather than to assure 
the definite confirmation or disconfirmation 


Tactics OF INGRATIATION 19 


of carefully reasoned predictions. In our 
judgment, the issues exposed by this study 
are sufficiently undeveloped to make the pros- 
pect of Type II errors a tolerable risk. 


SUMMARY 


The present investigation was designed to 
explore some of the different ways in which 
high and low status persons respond to pres- 
sures to be ingratiating. The subjects were 
freshmen and upperclassmen in a Naval 
ROTC program, Each freshman subject was 
assigned to an upperclass partner, and vice 
versa; during the experiment they were asked 
to communicate by exchanging messages first 
on a variety of opinion issues, then on their 
own pictures of themselves, and finally, on 
their impressions of the partner. These mes- 
sages were actually intercepted and bogus 
messages substituted. The investigation was 
conducted over a 2-year period. During the 
first year, all subjects were instructed prior to 
the interaction concerning the importance of 
pair compatibility and mutual attraction. 
During the second year the importance of 
candor and “not misleading” the partner 
‘were stressed, 

The communication tasks were designed to 
parallel the three obvious areas in which 
ingratiation tactics can occur: opinion con- 

_ formity, self-presentation, and other enhance- 
ment. The major results of the experiment, 
most of which were in line with pre-experi- 
mental expectations, may be expressed in 
terms of these dependent variables: (a) 
Opinion conformity—high and low status 
subjects tended to show approximately the 


same degree of conformity to each other on 
issues not especially relevant to the status 
hierarchy, but the highs conformed sig- 
nificantly less than the lows on the more 
relevant issues. Ingratiation instructions 
raised the amount of conformity for both 
status groups, but did not significantly alter 
the relationship between status and relevance. 
(b) Self-presentation—high status subjects be- 
came more modest when under pressure to 
make themselves more attractive. Low status 
subjects showed the same tendency on items 
rated as “important,” but became more self- 
enhancing on the unimportant items. (c) 
Other enhancement—there was a general 
tendency for low status subjects to be more 
flattering in their appraisals of the other than 
high status subjects, and for all appraisals 
to be more favorable under ingratiation than 
under control conditions. 

The results were discussed in terms which 
considered the particular positions of the high 
and low status person, and the problems 
associated with these positions when an in- 
dividual is asked to communicate various 
kinds of information about himself. The high 
status person, or the leader, is faced with the 
problem of maintaining the respect of the 
followers without thereby becoming un- 
approachable. The low status person, or the 
follower, has the problem of assuring the 
leader of his competence without appearing 
to assume attributes which lie customarily in 
the leader’s domain. Most of the results 
could be rather nicely explained by consid- 
ering these differences in perspective as a 
function of status. 


REFERENCES 


Bates, R, F. Task roles and social roles in problem- 
solving groups. In Eleanor Maccoby, T. M. New- 
comb, & E. L. Hartley (Eds.), Readings in social 
psychology. (3rd ed.) New York: Wiley, 1958. 

Bravu, P. M. A theory of social integration. Amer. J. 
Sociol., 1960, 65, 545-557. 

Comen, A. R. Upward communication in experi- 
mentally created hierarchies. Hum. Relat., 1958, 11, 
41-53. 

Deutscu, M., & Gerarp, H. B. A study of norma- 
tive and informational social influences upon indi- 


vidual judgment. J. abnorm. soc. Psychol., 1955, 
51, 629-636. 

Festincer, L. A theory of cognitive dissonance. 
Evanston: Row, Peterson, 1957. 

Gerarp, H. B. Some determinants of self-evaluation. 
J. abnorm. soc. Psychol., 1961, 62, 288-293, 

Gorrman, E, The presentation of self in everyday 
life. Garden City: Doubleday, 1959. 

Hemer, F. Attitudes and cognitive organization. a 
Psychol., 1946, 21, 107-112. 

Hemer, F. The psychology of interpersonal rela- 
tions. New York: Wiley, 1958. 


> 


20 E. E. Jones, K. J. Gercen, anp R. G. Jones 


Homans, G. The human group. New York: Har- 
court, Brace, 1950. 

Jones, E. E., Gercen, K. J., & Davis, K. E. Some 
determinants of reactions to being approved or 
disapproved as a person. Psychol. Monogr., 1962, 
76(2, Whole No. 521). 

Jones, E. E., Jones, R. G., & Gercen, K. J. Some 
conditions affecting the evaluation of a conformist. 
J. Pers., 1963, in press. 

Tacrurt, R., BRUNER, J. S., & BLAKE, R. R. On the 


relation between feelings and perception of feelings 
among members of small gronps. In Eleanor 
Maccoby, T. M. Newcomb, & E. L. Hartley (Eds.), 
Readings in social psychology. (3rd ed.) New 
York: Wiley, 1958. 

Tursaut, J. W., & Kerley, H. H. The social psy- 
chology of groups. New York: Wiley, 1959. 

TuppenuAM, R, D. Correlates of yielding to a dis- 
torted group norm. J. Pers., 1959, 27, 272-284. 


(Received September 20, 1962) 


Vol. 77, No. 4 


Whole No. 567, 1963 


Psychological Monographs: General and Applied 
CR eee cee ee eee Á 


RETENTION AS A FUNCTION OF DEGREE OF LEARNING 
i AND LETTER-SEQUENCE INTERFERENCE 


BENTON J. UNDERWOOD anp GEOFFREY KEPPEL 
Northwestern University! 


This experiment tested the letter-sequence hypothesis of forgetting produced by 
extraexperimental sources of interference. 2 paired-associate lists of single-letter 
stimuli and single-letter responses were used, the lists differing in initial associa- 
tive strength between the paired letters. The letter-sequence hypothesis predicts 


more rapid forgetting of the list with the lower initial associative strength. 
6 different degrees of learning were given prior to the retention intervals which 
were 1 day or 7 days. The results gave no support to the letter-sequence 
hypothesis since forgetting was equivalent for the 2 lists. Degree of learning 
and rate of forgetting were inversely related. 


? Èa present study is concerned with 
retention of associations between two 


letters and was designed to test certain impli- 
cations of an interference theory of forgetting. 
Interference theory assumes that most if not 
all forgetting of a given task is produced by 
interference from conflicting associations. Be- 
yond this basic assumption, however, the 
theory consists of a loose set of more or less 
specific assumptions concerning the operation 
of interference mechanisms. These assump- 
tions actually consist of a mixture of work- 
ing hypotheses and empirical facts about the 
detailed mechanisms of interference. To a 
large extent the specifics of an interference 
theory remain in a fluid state, being continu- 
ally reshaped and modifed to meet the de- 
mands of new findings. The present study 
tests a specific hypothesis concerning the oper- 
ation of a particular type of interference. As 
will be seen, the results appear to make neces- 
sary further modifications in the conceptualiza- 
tion of the operation of this interference. 

No attempt will be made here to give a 
general review of interference theory; this 
has been recently done by Postman (1961). 
The aspects of the theory relevant to the 
present study must, of course, be examined. 
Interference theory gains much of its em- 
pirical backing from studies of proactive in- 
hibition (PI) and retroactive inhibition (RI). 


1 This study was supported by Contract Nonr-1228 
(15) between Northwestern University and the Office 
of Naval Research. B. Ekstrand aided in the anal- 


yses. 


The mechanisms assumed to be operating in 
PI are directly applicable to the present situ- 
ation. PI in retention is produced by conflict- 
ing associations which have been learned 
prior to the learning of the task to be re- 
called. During the acquisition of the task to 
be recalled it is assumed that previously 
learned associations which conflict with the 
learning will be extinguished. Over the re- 
tention interval, however, these extinguished 
associations are said to recover associative 
strength and will interfere with recall. There 
is independent evidence for both extinction of 
associations (Barnes & Underwood, 1959) and 
a form of spontaneous recovery of extin- 
guished associations (Briggs, 1954). Now, if 
a subject learns a single task—if no formal 
interfering task is learned prior to the learn- 
ing of the task to be recalled—it may be pre- 
sumed that the same mechanisms of interfer- 
ence are operating. That is, the subject brings 
to the laboratory a repertoire of previously 
acquired associations some of which may con- 
flict with those to be learned. During the 
learning, therefore, these conflicting associa- 
tions should be extinguished, but will recover 
over the retention interval to produce for- 
getting. This recovery process will be acceler- 
ated if the subject “uses” the extinguished as- 
sociations during the retention interval, and 
at the same time this use should extinguish 
the associations to be recalled. Thus, both 
RI and PI may be involved, and forgetting 
produced from both sources of interference 
is known to be generally greater than from 


2 UNDERWOOD AND KEPPEL 


either alone (eg., Koppenaal & O'Hara, 
1962). 

It is common in RI and PI studies to use 
the A-B, A-C paradigm (successively learn- 
ing two different response terms to the same 
stimulus) in order to maximize interference 
effects. This paradigm may also be applied 
to the analysis of extraexperimental sources 
of interference to account for the forgetting 
of a verbal task when no formal interfering 
task is learned (Underwood & Postman, 
1960). Two sources of interference fitting 
this paradigm have been suggested. First, 
unit-sequence interference is said to apply to 
the interference obtaining between well-inte- 
grated verbal units, such as words. For ex- 
ample, if a subject is required to learn a 
serial list of common words, randomly or- 
dered, associations between each word and 
words not in the list undoubtedly are present 
as a consequence of linguistic usage of the 
words in the same contexts. Such associa- 
tions will be extinguished during learning, but 
will recover during the retention interval to 
interfere with the performance of the correct 
associations at recall. The second source of 
interference among verbal materials is called 
letter-sequence interference. Such a form of 
interference is applicable to verbal units which 
require integration of the letters during learn- 
ing. A difñcult consonant syllable provides an 
illustration. Each letter in the syllable is pre- 
sumed to have strong pre-experimental asso- 
ciations with letters other than those in se- 
quence in the syllable. Therefore, these 
associations will interfere with learning, they 
will be extinguished, and their recovery will 
depress retention. It is this particular aspect 
of interference theory—the letter-sequence 
hypothesis—with which the present study is 
concerned. 

The level or degree of association between 
successive letters in a trigram, such as a con- 
sonant syllable, is directly related to the 
classical dimension of meaningfulness (Under- 
wood & Schulz, 1960). It is clear, therefore, 
that the letter-sequence hypothesis predicts 
that at least within certain limits there will 
be a direct relationship between meaningful- 
ness and retention. Or, conversely, the lower 
the meaningfulness the greater the forgetting. 
The letter-sequence hypothesis was offered in 


spite of the fact that some evidence indicated 
that meaningfulness of nonsense syllables is 
not a variable in retention (e.g., Underwood 
& Richardson, 1956) and in spite of the fact 
that the results of the experiment accompany- 
ing the initial elaboration of the hypothesis 
gave little evidence to support the notion 
(Underwood & Postman, 1960), However, 
for reasons outlined in the latter report, it 
was not felt that the letter-sequence hypothe- 
sis had received a definitive test. Such a test 
is attempted in the present investigation. 
In the present study paired-associate lists 
are used in which single letters are stimuli 
and single letters are response units. In one 
list the pre-experimental association between 
the two letters constituting a pair is very 
low (LA list) and in the other it is high (HA 
list). It would be expected that HA list 
would be learned more rapidly than LA list. 
Furthermore, according to the letter-sequence 
hypothesis, HA list should be retained better 
than LA list. The reason for this is that dur- 
ing the learning of LA list more and stronger 
conflicting associations must be extinguished 
than is true during the learning of HA list. 
Assuming that the extinguished associations 
recover, a greater amount of interference 
should impinge on the recall attempts for LA 
list than for HA list. Therefore, List LA 
should be forgotten more rapidly than List 
HA. This is a clear and unambiguous pre- 
diction from the letter-sequence hypothesis 
(Underwood & Postman, 1960): 


The letter-sequence interference gradient will be at 
a maximum when the pre-experimental associative 
strength between letters is low. The amount of such 
interference will decrease as the pre-experimental as- 
sociative connection between letters increases [p. 75]. 


A central variable in interference theory is 
degree of learning of both the interfering as- 
sociation and the association to be recalled. 
In the present study, since extraexperimental 
associations constitute the interference, only 
degree of learning of the associations to be 
recalled is manipulated. To speak of “degree 
of learning” is to accept a working concep- 
tualization concerning associations, namely, 
that associations are developed incrementally. 
More particularly, to assume that degree of 
learning is a relevant variable in forgetting 
is to assume not only that the strength of the 


RETENTION AND LETTER-SEQUENCE INTERFERENCE 3 


association must develop to a certain level 
before the response term can be performed 
or anticipated correctly within the limits im- 
posed by the experimental conditions (e.g., 
the length of anticipation interval) but also 
that continued performance after initial an- 
ticipation is a relevant variable for forgetting 
theory. The mechanisms involved are straight- 
forward. Not only may continued performance 
of the association increase its absolute strength 
but also it should lead to a greater extinction 
of the interfering associations. Thus, the 
higher the degree of learning the less should 
be the observed forgetting. Recent challenges 
to this expectation, stemming from an all-or- 
none conception of learning have appeared 
(e.g., Estes, 1960). However, the classical 
literature (e.g., Krueger, 1929) as well as re- 
cent studies (e.g., Wollen, 1962) would seem 
to support a positive relationship between de- 
gree of learning and retention. In any event, 
the present study will allow another exami- 
nation of the relationship between degree of 
learning and retention, although it should be 
clear that the present interest in degree of 
learning stems from its importance as a varia- 
ble for an interference theory of forgetting. 

Finally, some consideration should be given 
to the absolute amount of forgetting to be 
expected for the present lists. These lists 
were learned by subjects who were relatively 
naive to laboratory learning and no formal 
interfering task was involved. Previous analy- 
sis has indicated that, generally speaking, 
perhaps 15%-25% forgetting may be ex- 
pected over 24 hours under such conditions 
(Underwood, 1957). However, if a strict 
interpretation of an interference theory as 
discussed above is applied, more severe for- 
getting might be predicted for the present 
lists. Consider the pair M-K as one pair in 
alist. According to theory, considerable inter- 
ference would be expected in learning a list 
made up of such pairs since the two letters 
rarely, if ever, occur together in words; other 
letters are far more strongly associated with 
M than is K. Over the retention interval the 
extinguished associations should recover and 
interfere with recall performance. But, be- 
yond this, during the retention interval the 
subjects will probably perform or use asso- 
ciations which are contrary to M-K; they 


will write letter sequences conforming to usual 
letter sequences in words. This must almost 
inevitably be true over a 7-day retention 
interval with college students. Thus, the 
conditions would seem to be ideal for severe 
forgetting if an interference theory is applied 
to the conditions involved. However, as the 
results will show, retention is extraordinarily 
high. This apparent contradiction between 
fact and theory will be evaluated in the Dis- 
cussion section. 


METHOD 


Lists 


The LA list consisted of the following nine pairs: 
M-K, A-G, X-V, B-W, 0-Q, C-F, L-Z, U-J, and I-¥. 
The HA list consisted of the following pairs: Z-X, 
A-C, É-K, R-N, V-Y, S-P, B-L, M-U, and H-T. 
In the letter association tables each of LA pairs shows 
zero association in that none of the 273 subjects 
given the letter association procedure responded with 
the response letter when presented the stimulus let- 
ter (Underwood & Schulz, 1960). The values for 
HA list range from 7 to 9, i.e, from 7 to 9 of the 
273 subjects responded with the response letter when 
the stimulus letter was presented. No letter is re- 
peated within either list. 

Three sources of interference for each list may be 
identified. First, other response letters within the 
list may have stronger associations to the stimulus 
letter than does the letter which is correct for learning. 
By stronger association is meant that more subjects 
in the letter association procedure responded with 
these other letters than with the letter that is correct 
for a particular stimulus letter. For LA list an aver- 
age of 6.8 response letters have stronger associations 
than does the correct letter. For HA list the cor- 
responding value is 2.7. A second source of inter- 
ference may arise from other stimulus letters; these 
stimulus letters may have stronger associations to a 
given stimulus letter than does the response letter. 
Indeed, for LA list all eight other stimulus letters 
have a value of 1 or more to each stimulus letter, 
hence the average is 8.0 potentially interfering stim- 
ulus letters. In HA list the value is 3.1, Finally, 
letters not in the list may interfere with learning 
and recall. Again, for LA list all eight letters not 
in the list had a value of at least 1 for each stimulus 
letter; therefore, the mean is 8.0. The corresponding 
value for HA list is 2.6, All of the above evidence 
indicates that more interference should impinge upon 
the learning of LA list than upon HA list. 

A third list was used for one of the retention in- 
tervals. This list consisted of three pairs from LA 
list, three from HA list, and three additional pairs 
with still higher associative connection between 
stimulus and response letters than was true in HA 
list, This list was constructed to test retention as a 
function of the heterogeneity of item difficulty. 
However, the learning results showed that differences 


4 UNDERWOOD AND KEPPEL 


in item difficulty for this third list were in fact a 
little less than for LA list so that the results on 
retention are not useful for their intended purpose. 
The fact that this list was used is mentioned so that 
certain aspects of the design can be understood. Only 
minor reference will be made to the results for this 
list. 

The lists were presented at a 2:1 second rate. 
That is, the stimulus term was exposed for 2 seconds, 
the stimulus and response terms together for 1 sec- 
ond. Anticipation learning was used throughout. 
Four different orders of the lists were used, with all 
subjects having the same start order for learning and 
the same order for recall. The intertrial interval was 
3 seconds. 


Conditions and Subjects 


For both LA and HA lists six degrees of learning 
and two retention intervals were used. Each com- 
bination of degree of learning and retention interval 
was represented by a different group of 18 subjects. 
Thus, there were 24 groups of 18 subjects each. The 
six degrees of learning were 2, 4, 6, 10, 15, and 25 
anticipation trials following the initial study trial. 
The two retention intervals were 1 day and 7 days. 
For the third list (mentioned above) only the 1-day 
retention interval was used. 

The experiment was run in two phases. In the first 
phase all conditions requiring a 1-day retention in- 
terval were administered. In this phase 18 groups of 
subjects were involved (three lists and six degrees 
of learning for each). Eighteen blocks of 18 condi- 
tions each were constructed such that each condition 
was represented once within each block and that 
within each block each condition occurred once in 
each position, all 18 blocks considered. The subjects 
were assigned to a block in order of their appearance 
at the laboratory. A given experimenter always com- 
pleted a block of 18 subjects although not all of 
the experimenters ran the same number of blocks. 

In the second phase, employing the 7-day retention 
interval, only the two lists (HA and LA) were in- 
volved. For scheduling in this phase 18 blocks of 12 
conditions each were made up such that each of the 
12 conditions occurred once on each block, with the 
order of conditions within the blocks being ran- 
domized. In both phases of the experiment, if the 
subject was dropped for any reason (mechanical fail- 
ure, failure to return for recall, etc.), the experimenter 
replaced that dropped subject with the next new sub- 
ject appearing. All subjects were college students 
enrolled in the elementary psychology course. Finally, 
all subjects were naive to formal verbal learning ex- 
periments, 


Procedure 


Each subject was first given 10 trials on a practice 
list consisting of the names of 9 months as the stim- 
ulus terms and the numbers 1 through 9 as response 
terms, the pairing being nonsystematic (e.g., 1 was 
not the response to the stimulus January). Follow- 


ing the 10 trials on the practice list the prescribed 
number of trials on the experimental list was given. 
Before the subject was dismissed a reminder was 
given to return the following day (or the following 
week) to continue the experimentation. In view of 
the ambiguous effects of antirehearsal instructions, 
none was given (Underwood & Keppel, 1962). For 
the subjects given the 7-day interval a postcard was 
sent as a reminder to return at the appointed time. 

At the time of recall the subject was given very 
explicit instructions as to what was expected of him. 
These instructions indicated that he was being asked 
to recall the “letter” list, that he was to try to get 
as many response letters correct as possible on the 
first trial, and that he was to continue with the list 
until stopped. Under all conditions five relearning 
trials were given. 


RESULTS 
General Evaluation 


Equivalence of Groups. As noted earlier, 
the major results will concern HA and LA 
lists. A total of 24 different groups is involved, 
12 having learned LA list, 12 HA list, and 
for each list 6 having a 1-day retention inter- 
val and 6 a 7-day retention interval. Since 
the conditions for the different intervals were 
not run simultaneously, evidence is needed 
concerning the equivalence of the groups in 
learning ability. Two sets of data are availa- 
ble for assessing the equivalence. 

All subjects were given a common practice 
list. The average product-moment correlation 
for 18 subjects between total correct responses 
on the practice list and the total correct on 
the first two trials of the experimental list 
is .36 for LA list and .30 for HA list. An 
overall analysis of variance on the practice 
list scores for the 24 groups shows that none 
of the variables or interactions was significant; 
the null hypothesis cannot be rejected. Thus, 
the 24 groups appear to be drawn from the 
same population. However, perhaps a second 
test is more appropriate. The correlations 
between the total correct responses on the 
first two trials on the experimental list and 
the total correct responses on all remaining 
trials for the 20 groups having more than two 
trials were determined. These averaged .75 
for LA list and .72 for HA list. With these 
substantial correlations in mind, the critical 
question is whether the learning of the groups 
having a 1-day retention interval differs from 
those having a 7-day interval since these 


RETENTION AND LETTER-SEQUENCE INTERFERENCE 5 


TABLE 1 


Basic LEARNING AND RETENTION DATA 
(All entries are mean correct responses, each based 


on N=18) 
Degree of learning 
List 2 4 6 10 15 25 
LA—1 day 
Last trial 4.33 5.83 6.83 8.44 8.61 8.61 
Recall 3.94 5.39 6.00 7.17 8.17 8.06 
Relearning 22.94 28.50 30.00 34.05 34.83 35.27 
Total 7.28 19.06 29.94 70.56 110.00 187.78 
LA—7 days 
Last trial 4.28 6.28 6.61 8.38 7.50 8.44 
Recall 2,06 3.33 3.00 5.67 4.89 6.00 
Relearning 21.22 25.17 25.39 33.50 30.67 33.33 
Total 7.28 19.00 29.39 69.83 96.22 185.17 
HA—1 day 
Last trial 5.22 8.22 8.56 8.78 8.56 8.67 
Recall 5.22 6.89 7.17 8.44 8.28 8.61 
Relearning 28.11 33.67 34.39 35.62 35.16 35.67 
Total 9.11 26.28 44.28 78.95 117.00 204.00 
HA—7 days 
Last trial 6.17 7.50 8.33 8.39 8.67 8,89 
Recall 4.00 4.00 4.50 6.72 6.00 6.94 
Relearning 28.83 28.89 32.50 33.78 33.72 35.17 
Total 11.22 5.39 44.11 76.39 117.22 201.67 


groups were run at different times during the 
school year. For both HA and LA lists the 
F for total correct responses on the first two 
trials for the six 1-day groups versus the six 
7-day groups is less than 1.0. 

The above evidence indicates that, statisti- 
cally speaking, the 24 groups were drawn 
from the same population. But, when so 
many groups are drawn randomly, considera- 
ble variation in the learning of the groups 
will be apparent and still the statistical con- 
clusion of “no difference” will hold. As will 
be shown, degree of learning is a very critical 
variable in the retention of these lists so that 
even small and nonsignificant differences in 
ability level of the groups add considerable 
variance to the retention data. 

Basic Learning and Retention Data. Table 
1 shows several performance measures ob- 
tained at various stages. The entry Total 
indicates the total correct responses given 
during learning. The entry Last trial indi- 
cates the number correct on the last trial of 
learning. The Relearning entry does not in- 
clude the recall trial; it is the mean total 
correct on the four trials after the recall 
trial. 

The statistical analysis of the mean total 
correct responses in learning for all 24 groups 
shows the obvious effect of degree of learning. 
The F for the retention interval for mean total 
correct is 1.83 which is far short of signifi- 


cance, thus indicating in another manner the 
statistical equivalence of the groups on learn- 
ing prior to the giving of the two difference 
retention intervals. The critical comparison 
is between the learning for HA list and LA 
list. The F for this comparison is 62.03, with 
the F required for the 1% significance level 
being 6.70 (df=1/408). As may be seen in 
Table 1, at all degrees of learning the per- 
formance on List HA is better than that on 
List LA. Thus, differences in the initial 
strengths of the associations between the pairs 
of letters in the lists have produced a highly 
significant effect on learning. 

An adequate evaluation of the effect of the 
variables on retention cannot be made from 
the data of Table 1. Reasons for this will be 
elaborated shortly, However, to show the 
nature of the analytical problems involved, 
an initial abstraction of the data is made in 
Figure 1. This figure shows the acquisition 
curves for each list and the mean number of 
items recalled for each degree of learning. 
The combined performance of the two 25- 
trial groups for each list has been used to 
exhibit the acquisition functions. The per- 


Learning e-e : 


nw bon n @ 


MEAN NUMBER CORRECT 


=— Nw pa nn oo 


5 7 9 I 13 15 I7 19 21 23 25 
TRIALS 


Fic. 1. Basic learning and retention functions. 


HES 


6 UNDERWOOD AND KEPPEL 


formance of these groups on Trials 3, 5, 7, 
11, 16, and 25 (since there is no twenty-sixth 
trial) may be used to indicate the expected 
recall after a 3-second interval (the interval 
between trials) for the respective degrees-of- 
learning groups, and these values may be 
used as a base for comparison of retention 
after 1 and 7 days. Thus, in Figure 1, recall 
performance for the groups having had two 
learning trials is plotted as a third trial, re- 
call of those having four learning trials is 
plotted as a fifth trial, and so on. Subtracting 
the mean number recalled at 1 day or 7 days 
from the 3-second recall (learning) gives an 
estimate of the absolute number of items lost. 
Certain features of Figure 1 will now be 
pointed out as a basis for the more precise 
analyses to follow. 

For both lists recall after 1 day shows 
essentially no loss for high degrees of learn- 
ing; forgetting is apparent only with low 
degrees of learning. However, this does not 
appear to hold between 1 and 7 days. That 
is, the number of items lost between these 
two intervals appears to be roughly constant 
for all degrees of learning. Figure 1, as well 
as Table 1, indicates that the recall is higher 
for HA list than for LA list. However, it 
can also be seen that for comparable degrees 
of learning the performance in learning is 
higher for HA list than for LA list. There- 
fore, comparisons in retention between the two 
lists are not appropriate at equal formal de- 
gree-of-learning points since the amount ac- 
tually learned differs at those points. A 
different method of analysis will be required 
to determine if there are differences in reten- 
tion of the two lists. First, however, attention 
will be directed to the analysis of the effects 
of degree of learning. 


Degree of Learning and Recall 


According to an interference theory, asso- 
ciations with low associative strength will be 
interfered with more at recall than will those 
with high associative strength. To test this 
expectation the appropriate response measure 
is the proportion of items lost over the reten- 
tion intervals at the varying degrees of learn- 
ing. The first analysis of the effects of degree 


of learning on recall will derive such a meas- 
ure for each subject. 

Percentage of Retention and Degree of 
Learning. To obtain a percentage of retention 
score for each subject, several preliminary 
steps are involved. These steps produce a 
score for each subject of the number of items 
he would have been expected to get correctly 
had there been another learning trial. On 
the average, of course, the best estimate of 
such a score for each subject would be the 
mean learning score for all other subjects 
having learning trials beyond that of the 
subject involved. Thus, in Figure 1, the per- 
centage of retention for a subject having two 
learning trials would be the percentage his 
recall is of the mean third trial learning score 
of the subjects having more than two learn- 
ing trials. That is, the third learning trial 
gives the expected recall after 3 seconds. 
However, the variability of the percentage of 
retention scores can be reduced by getting 
an expected score for each subject based only 
on other subjects who learn at approximately 
the same rate as the given subject. Further- 
more, since the items within each list differ 
somewhat in difficulty, a separate expected 
score may be obtained for each item for a 
given ability level. In view of these consid- 
erations, several steps were used in evolving 
the percentage of retention for each subject. 
First, a distribution of scores for the 216 
subjects for each list on the first two trials 
of learning was drawn up. Each distribution 
was divided into six groupings representing 
different ability levels in learning, The N 
within each group varied from 24 to 49. For 
each ability group, acquisition curves were 
determined for each item, these curves being 
based on all subjects within a group regardless 
of degree of learning given. Obviously, the 
number of cases determining the curves was 
greater at lower degrees of learning than at 
higher degrees of learning. Each point for a 
curve represented the percentage of subjects 
getting the item correct, a point being availa- 
ble for each of the 25 trials. A smooth curve 
was drawn through these points. From each 
such curve—a different one for each ability 
group for each item for each list—the pro- 
portion of subjects who would have gotten 
the item correct on Trials 3, 5, 7, 11, 16, and 


RETENTION AND LETTER-SEQUENCE INTERFERENCE 7 


26 could be determined. Summing these pro- 
portions across items gives the expected scores 
for the subject in a given ability group on 
Trials 3, 5, 7, and so on. That is, this score 
tells how many of the nine items the subject 
would have been expected to anticipate cor- 
rectly had learning continued with only the 
3-second pause between trials. These expected 
scores vary for each ability group, the differ- 
ences being rather large at low degrees of 
learning. Finally, the percentage of retention 
for each subject was determined by dividing 
his actual recall score after 1 day (or after 
7 days) by the expected score and multiplying 
by 100. These scores become the entry of 
percentage of retention for each subject of 
the originally constituted groups of 18 sub- 
jects each. These scores obviously cannot be 
less than 0, but they may go above 100% 
if the subject recalls more than expected for 
his ability group, a situation which occurred 
a number of times. 

The mean percentage of recall scores for 
both lists are shown in Table 2. Since the 
distributions on which the means are based 
were not seriously skewed, analysis of vari- 
ance has been applied directly to the percent- 
age scores, the analysis being done separately 
for each list. These analyses lead to the same 
conclusion for both lists, namely, that length 
of retention interval is highly significant 
(more forgetting after 7 days than after 1 
day), and that the degree of learning influ- 
ences recall. For HA list the F for degree of 
learning is 9.03, and for LA list, 5.19. With 
5 and 204 degrees of freedom, an F of 3.11 is 
needed for the 1% significance level. While 


TABLE 2 


MEAN PERCENTAGE OF RECALL SCORES AS A FUNCTION 
or Lists, DECREE OF LEARNING, AND 
RETENTION INTERVAL 


. t3 
Degree of LA lista HA list> 

learning Iday 7days 1day 7days 

2 74.56 38.22 81.72 54.39 

4 76.56 48.78 88.95 50.83 

6 89.67 43.39 85.39 54.50 

10 87.06 68.44 98.00 78.28 

15 98.22 58.72 94.06 68.50 

25 93.50 69.33 96.89 77.56 


agm based on within-groups variance is 6.28. 
bøm based on within-groups variance is 4.34. 


the relationship between degree of learning 
and recall is by no means regular, it is clear 
from Table 2 that in general as degree of 
learning increases the percentage of recall 
increases. 

Two additional matters relevant to the re- 
call. scores in Table 2 need evaluation. As 
noted, while degree of learning produces a 
highly significant effect on recall, a direct 
and smooth relationship is not apparent. 
Some of the irregularity appears to be due 
to the fact, mentioned earlier, that the groups 
differed somewhat in learning ability, and 
since recall is sensitive to the amount learned, 
some of the scores appear to be out of line 
because of this fact and not because a regular 
relationship between degree of learning and 
retention does not exist. For example, the 
fortunes of randomization produced subjects 
in three of the four 10-trial groups whose 
average learning speed was consistently above 
the average of the other groups having the 
same list and retention interval. Thus, the 
apparent leveling of the recall curves between 
10 and 25 trials probably is due in part to 
the fact that the scores for the 10-trial groups 
are “too high.” It may be noted that the 
percentage of recall for the third list (see 
Method section) shows quite regular incre- 
ments in recall as degree of learning increases. 
Nevertheless, the small differences in recall 
between 10 and 25 degree-of-learning trials, 
even after 7 days, suggests a relatively slight 
effect of the additional trials beyond 10. This 
issue will be considered more fully later. 

The second matter concerns the relationship 
between degree of learning and forgetting be- 
tween 1 day and 7 days. Between 3 seconds 
and 1 day, a greater percentage of items is 
lost for low than for high degree of learning. 
A continued further greater loss for low than 
for high degree of learning, to be expected 
by interference theory, is not apparent be- 
tween 1 day and 7 days in Table 2. The 
interaction term (degree of learning x re- 
tention interval) is not significant for either 
list. However, the 7-day values are based on 
the expected scores after 3 seconds. The 
most appropriate question is to ask if there 
is a continued greater forgetting for low than 
for high degree of learning between 1 and 7 
days, since the degree-of-learning effect for 


8 UNDERWOOD AND KEPPEL 


7 days in Table 2 may only be reflecting the 
effect which took place between 3 seconds 
and 1 day. To answer this question, a per- 
centage of the retention score for each subject 
in the 7-day groups was obtained, using the 
mean recall of the corresponding degree-of- 
learning group after 1 day as the base. Again, 
variations in the performance of the groups 
resulted in the lack of a regular relationship 
between degree of learning and percentage of 
retention between 1 day and 7 days. How- 
ever, some stabilization was achieved by com- 
bining the scores for groups having 2, 4, and 
6 degree-of-learning trials to form a single 
group of 54 subjects having relatively low 
degrees of learning, and combining those 
having 10, 15, and 25 trials to form a group 
of 54 subjects having relatively high degrees 
of learning. For LA list the mean percentage 
of retention is 55% for low degree of learning 
and 71% for high; for HA list the corre- 
sponding values are 66% and 78%. The 
differences between the percentages are sig- 
nificant for each list, the F being 6.48 for 
LA list and 4.90 for HA list; with 1 and 106 
degrees of freedom the F required for the 
5% level is 3.90, It may be concluded that 
forgetting is greater for low degrees of learn- 
ing than for high degrees of learning after 1 
day and also between 1 and 7 days. 
Probability Analysis and Degree of Learn- 
ing. Another way of viewing the relationship 
between degree of learning and retention is 
by the use of a probability analysis in which 
only items which have been correctly antici- 
pated at least once in learning are considered 
(Underwood, 1954). For this analysis, the 
results for both lists were combined. An ac- 
quisition function was first derived from the 
four groups having 25 trials. In determining 
this function a tally was made of the number 
of times each item was given correctly on 
the trial immediately after having been given 
correctly 1, 2, 3, etc., times. This curve is 
the upper one in Figure 2. It should be noted 
that on the average after the first correct 
response about 80% of the items were cor- 
rect on the immediately following trial. How- 
ever, this value varies appreciably as a func- 
tion of the ability level of the subject. For 
the six ability groupings discussed earlier, 
the percentage correct on trial after first cor- 


rect varied from 92% for fast learning sub- 
jects to 59% for slow learning subjects for 
LA list, with the corresponding values being 
98% and 72% for HA list. Even after two 
correct anticipations, large differences in abil- 
ity level were still evident. That is, after two 
correct anticipations the percentage correct 
on the trial immediately following the second 
correct anticipation was 92% for fast learning 
subjects and 78% for slow learning subjects 
on LA list, with values of 98% and 83% for 
HA list. 

The acquisition function in Figure 2 has 
only a slight bias due to item drop out, i.e., 
difficult items being given correctly only a 
few times by slow learning subjects are rep- 
resented only in the initial portion of the 
curve. The evidence on this is as follows. 
With four groups of subjects of 18 each, 
learning 9 pairs, there is a total of 648 items 
available. All items were given correctly at 
least twice; 99.2% were given correctly at least 
5 times, 96.6% at least 10 times, and 90.3% 
at least 15 times. Thus, only very late in the 
acquisition process does a serious loss of items 
start to occur; therefore, the easier items and 
fast learning subjects are more heavily rep- 
resented in the latter portion of the curve 
than in the early portion. The effect of this 
can only be to increase the level of the curve 
which is nearly asymptotic after 10 trials 
anyhow. 

For determining the recall functions all 
degrees of learning were combined for both 


PERCENTAGE CORRECT ON NEXT TRIAL 
a 
° 


1 tS Be A eae | ee 23 25 
NUMBER OF CORRECT RESPONSES 
Fic. 2. Relationship between number of prior cor- 
rect anticipations and percentage of correct responses 
on the immediately following trial during learning 
and after 1 day and 7 days. 


RETENTION AND LETTER-SEQUENCE INTERFERENCE 9 


lists. For each item the number of correct 
responses given during learning was noted 
along with whether or not the item was cor- 
rect at recall. Items having an equivalent 
number of correct anticipations during learn- 
ing were grouped and the percentage correct 
at recall calculated. Since all degrees of learn- 
ing were combined, it should be noted that 
there can be no systematic bias in the curve, 
at least up to about 15 correct anticipations. 
That is, slow and fast learning subjects are 
represented fairly equally throughout the 
range of response strengths as well as are 
easy and difficult items. Since the recall 
curves are very irregular, smoothed curves 
have been drawn by inspection, It is clear 
that there is a general increase in the per- 
centage correct at recall as the number of 
correct anticipations during learning increases. 
If overlearning is defined as that range in the 
curve for which additional correct anticipa- 
tions produce very little increase in the per- 
centage correct during learning, this range 
starts at about 10 correct anticipations. There 
appears to be some increase in recall with 
increases in the number of correct anticipa- 
tions beyond this point. It should be noted 
that unlike the recall curves of Figure 1, 
Figure 2 includes only items which were got- 
ten correctly at least once during learning. 
Thus, Figure 2 shows that there is a distinct 
advantage for retention in repeating items 
beyond the first correct anticipation. Ac- 
cording to interference theory, such repeti- 
tions lead to a greater and greater extinction 
of the interfering associations as well as in- 
creasing the strength of the association being 
learned. 

Within-Subject Differences in Degree of 
Learning. To obtain further data relevant to 
the forgetting between 1 day and 7 days, a 
third way of viewing the relationship between 
degree of learning and recall was used. Re- 
call was analyzed as a function of differences 
in strength among the nine associations for 
each subject. On the assumption that the 
number of correct anticipations is directly 
related to associative strength, the nine items 
for each subject were rank ordered from 1 
(strongest) to 9 (weakest). Ties in number 
of correct anticipations were resolved by a 
random method so that nine whole-numbered 


ranks were given each subject. For each 
rank there are 18 subjects for each formal 
degree of learning, or a possibility of a maxi- 
mum of 18 correct anticipations at recall. The 
question asked is what percentage is the 7- 
day recall of the 1-day recall for the differ- 
ent ranks, For purposes of stability, the 
results for successive groups of three ranks 
have been combined. These groups are labeled 
“strong,” “moderate,” and “weak” in Figure 
3. Also, to represent low and high formal 
degree of learning, the results for the 4- and 
6-trial groups have been combined (the 2- 
trial groups had too many 0 scores for items) 
as have those for the 15- and 25-trial groups. 

Figure 3 shows that, for both lists, per- 
centage of retention increases as the item 
strength among the nine items for a subject 
increases. This is to say that the percentage 
of forgetting between 1 day and 7 days is 
greater the lower the degree of learning of the 
associations for a subject. Differences in the 
level of the curves for a given list represent 
the influence of formal degree of learning as 
discussed earlier. Differences between the 
two lists represent the overall higher level 
of learning achieved for HA list than for LA 
list for a constant number of trials. 

It might be supposed that the relationship 
between the strength of an item and the per- 
centage of forgetting as exhibited by the four 
curves in Figure 3 actually represents the 
relationship between recall and the ease or 


PERCENTAGE OF RETENTION: 1 DAY—7 DAYS 


Weak Moderate Strong 


ITEM STRENGTH 


Fic. 3. The relationship between item strength and 
percentage of retention between 1 day and 7 days. 


10 UnNpERWoop AND KEPPEL 


difficulty of learning an item over and above 
the degree of learning which is correlated 
with ease or difficulty. It is doubtful that 
this is true as subsequent comparisons be- 
tween HA list and LA list will show. 

It may be concluded that all manners of 
viewing the data which have been used arrive 
at the same conclusion, namely, that the 
lower the strength of an association the more 
rapid the forgetting. This law holds between 
3 seconds and 1 day, and between 1 and 7 
days. 


Relearning 


To obtain the percentage of retention scores 
as measured by relearning, the same procedure 
was followed as that used to derive the per- 
centage of recall scores. The initial step was 
to get expected scores following a 3-second 
retention interval. Thus, for the groups given 
two learning trials, the total number of cor- 
rect responses a subject would have gotten 
(had he continued learning) on Trials 4, 5, 
6, and 7 was determined. For the groups 
given four learning trials the determination 
was made for Trials 6, 7, 8, and 9, and so on. 
Thus, the recall trial is not here considered 
in the relearning scores, Expected scores for 
the four relearning trials were determined 
separately for the six ability groups and for 
each item in the list separately. The values 
were obtained from the acquisition curves 
originally constructed to obtain expected re- 
call. The percentage of retention for a sub- 
ject was then determined by dividing the total 
correct responses given on the four relearning 
trials after 1 day (or after 7 days) by the 
number expected had learning been continu- 
ous. Again, it should be noted that the per- 
centage scores may fall above 100%. 

The results are shown in Table 3, Gen- 
erally speaking, the results parallel those 
found on recall. Relearning is slower after 
7 days than after 1 day, although very little 
decrement is noted after 1 day. Analyses of 
variance were performed on the scores for 
each list separately. The length of the inter- 
val is highly significant for both lists, but 
degree of learning is significant only for LA 
list. The F for this list is 4.22, and with 5 
and 204 degrees of freedom this is significant 


TABLE 3 


MEAN PERCENTAGE OF RELEARNING SCORES 


Degree of LA list® HA list? 
learning lday 7 days 1day 7days 
2 85.94 79.11 94.72 90.22 
4 94.17 86.17 102.78 87.44 
6 102.94 85.67 99.83 95.95 
10 96.56 99.28 101.33 96.39 
15 102.22 91.11 99.22 95.06 
25 102.06 95.50 99.78 98.22 


* am based on within-groups variance is 4.24. 
bom based on within-groups variance is 2.58. 


beyond the 1% level. The comparable F for 
HA list is 1.92, which falls short of the 5% 
level. The interactions between degree of 
learning and length of interval did not ap- 
proach significance. 


HA List versus LA List 


The letter-sequence hypothesis predicts that 
LA list will show more rapid forgetting than 
HA list. The LA list was learned more 
slowly than HA list, indicating that the initial 
associative strength between the letters of the 
pairs was lower for LA list than for HA list. 
It would also be expected that a greater num- 
ber of potentially interfering tendencies 
should fall on LA list at recall. To compare 
the recall of the two lists requires that the 
degree of learning for the two lists be equiva- 
lent wherever comparisons are made. Such 
comparisons cannot be made at points cor- 
responding to the manipulated degree of 
learning, since with equivalent number of 
trials for the two lists the obtained degree of 
learning was in fact higher for HA list than 
for LA list. The comparison may be made, 
however, by plotting the percentage of recall 
for both lists against a baseline of expected 
recall, i.e., expected recall after the 3-second 
intertrial interval in learning. These expected 
recall scores for each group of 18 subjects 
are simply the means of the expected scores 
as derived from the procedure discussed ear- 
lier to determine the percentage of retention 
for each group. 

The plot is given in Figure 4. It is ap- 
parent that wherever the degree of learning 
(the expected values) for the two lists over- 
lap there is no consistent difference in the 


RETENTION AND LETTER-SEQUENCE INTERFERENCE 11 


er 


=o List 
60 o----0 HA List 


PERCENTAGE OF RECALL 


7 8 g 


5 g 
EXPECTED 3-SEC. RECALL 


Fic. 4. A comparison of the recall of the two lists 
with expected recall as the measure of degree of 
learning. 


percentage of recall. A fairly accurate statis- 
tical comparison may be made by comparing 
the recall for the five groups having the high- 
est degree of learning for LA list against the 
five groups having the lowest degree of learn- 
ing for HA list, since the two ranges of for- 
mal degree of learning cover essentially 
equivalent ranges of expected recall. The 
results of this analysis show a highly signifi- 
cant effect of degree of learning and Interval, 
but the F for lists is only .89. It must be 
concluded that when the two lists are equated 
for true degree of learning—expected recall 
—there is no difference in rate of forgetting. 
A comparable plot and analysis for the re- 
learning scores yielded the same conclusion. 

One final aspect of Figure 4 may be pointed 
out. If the recalls for the two lists are con- 
sidered together, making 12 different expected 
recall values for each interval, an analysis of 
the trends by orthogonal polynomials shows 
that the only significant component is the 
linear one. This suggests that when degree 
of learning is measured by expected recall 
there is a linear relationship between it and 
recall after 1 and after 7 days. Furthermore, 
since the linear trend is upward, Figure $ 
shows clearly that when degree of learning is 
plotted in terms of expected recall, rather 
than trials, there is a continuous and positive 
relationship between degree of learning and 
recall. 

Another way of showing the equivalence in 
retention of the two lists is to compare items 
which have equal expected values but differ- 
ent numbers of acquisition trials. It will be 


remembered that acquisition curves were 
constructed for all items. From such curves, 
equivalent 3-second recall could be determined 
for pairs of items, one from HA list and one 
from LA list. For example, the expected re- 
call for Z-X from HA list after 2 learning 
trials was 77.1%, and for L-Z from LA list 
after 6 trials, 78.0%. The pair V-Y from 
HA list after 10 trials had an expected value 
of 92.9%, and the pair M-K from LA list 
had an expected value of 93.5% after 25 
trials. Will the retention of such items differ? 
A total of 36 pairs of comparable items could 
be obtained from the two lists allowing no 
more than 1% difference in expected scores. 
No item from HA list for a given degree of 
learning entered into more than one com- 
parison, although some items from LA list 
at a given degree of learning were used more 
than once. In the latter case, of course, it 
was compared with a different item from 
HA list. 

The number of times each item in each of 
the 36 pairs was correctly recalled after 1 
and after 7 days was listed. The difference 
in the number correctly recalled for each 
pair was then determined, always subtracting 
the recall of the item from LA list from the 
recall of the item from HA list. A separate 
distribution of these difference scores was ob- 
tained for 1 and 7 days. If degree of learning, 
rather than item difficulty, is the only critical 
factor involved, the mean of the distributions 
of difference scores should not differ signifi- 
cantly from zero, The mean difference score 
for the 1-day recall was —.25 items, and for 
7 days, —.50. The respective ¢’s are .57 and 
.77. Clearly, with expected recall after 3 
seconds equalized, the retention of the items 
from the two lists does not differ. 

The letter-sequence hypothesis predicted 
better retention of HA list than LA list. The 
evidence reviewed in this section gives abso- 
lutely no support to the prediction, hence to 
the hypothesis. 


Overt Errors 


Learning. A complete analysis of the overt 
errors for the first 10 learning trials on each 
list has been made, using 108 subjects for 
each list who were given 10 or more trials. 


12 UNDERWOOD AND KEPPEL 


For all 10 trials the error frequency was 
greater for the subjects learning LA list than 
for those learning HA list. For each response 
attempt, the subjects learning LA list had a 
greater proportion of errors than did those 
learning HA list. However, whenever amount 
learned or number of attempts was equated, 
the error rate for the subjects learning the 
different lists did not differ. 

All errors were classified in one of three 
categories: R errors, in which a response 
letter in the list was given to the wrong 
stimulus letter; S errors, in which a stimulus 
letter was given as a response; and Imports, 
in which a letter not in the list was given as 
a response. The imports constituted 10% of 
the total errors on the first trial (two lists 
combined) but dropped in frequency very 
rapidly. Furthermore, only a few subjects 
were responsible for the Imports, a single 
subject sometimes repeating the same letter 
on successive trials. 

On the first anticipation trial, R errors 
constituted 62% of the total errors made, 
S errors 28%, for both lists combined. On 
successive trials, however, S errors dropped 
out more rapidly than did R errors. By the 
tenth trial, approximately 90% of all errors 
were R errors, 10% being S errors. 

Recall Errors. More errors were made dur- 
ing the recall of LA list than during the recall 
of HA list. However, this may be attributed 
entirely to the different degrees of learning 
of the two lists and subsequent better recall 
of HA list. Since an appreciable number of 
subjects made no overt errors at recall, an 
analysis of the error frequency was made by 
items. Thus, the total number of errors made 
in recalling each of the nine items was de- 
termined. The total number of errors made 
was added to the number of correct responses 
to determine the total response attempts. 
This total was used as a base to determine 
the percentage of error per attempt for each 
item. 

The error frequency was found to be less 
the higher the degree of learning, but the 
trends of the curves between 1 and 7 days 
were remarkably similar. To demonstrate 
what may be considered error-recovery curves, 
the results for all degrees of learning and for 
both lists have been combined to produce the 


Y 
© 


R ERRORS 


5 


S ERRORS . 

e 

* IMPORTS 
° 


7 DAYS 


o 


PERCENTAGE OF ERROR PER OVERT RESPONSE 


| DAY 
RETENTION INTERVAL 


Fic. 5. Overt-error frequency in recall at 1 day and 
at 7 days. 


data from which Figure 5 was plotted. The 
values on the ordinate are the percentage of 
attempts which resulted in overt errors. Sepa- 
rate curves are shown for the three error 
categories. The frequency for all three cate- 
gories increased with time. Furthermore, it 
appears that R errors recovered more rapidly 
than either S errors or Imports. This might be 
expected on the grounds that these latter errors 
dropped out more quickly in learning, hence 
would be more thoroughly extinguished, than 
would be the case for R errors. As a rough 
estimate of the statistical significance of the 
difference in change in error frequency for 
S and R errors between 1 and 7 days, differ- 
ence scores for each item were obtained, For 
each type of error the percentage value at 1 
day was subtracted from that for 7 days for 
the 18 items (9 in each list). These differ- 
ences were obtained following an arcsin trans- 
formation of the percentage scores, The ¢ 
for the mean difference between these two 
distributions of difference scores was 2.25; 
with 17 degrees of freedom this is significant 
beyond the 5% significance level. It may be 
concluded tentatively that S errors recover 
more slowly than do R errors. 

Nature of Recall Errors. Two points rela- 
tive to the nature of the recall errors will be 
made. The first concerns differences in errors 
for the two lists. Use of the letter-association 
tables (Underwood & Schulz, 1960) allows 
an evaluation of the strength of the associa- 


RETENTION AND LETTER-SEQUENCE INTERFERENCE 13 


tions which resulted in errors. However, a 
distinction must be made between “seen” 
and “unseen” errors. A seen error occurs when 
a subject gives a stimulus or response letter 
as an error which he has seen previously dur- 
ing the recall trial. For example, in LA list, 
the first pair presented on the recall trial was 
M-K. If K or M were given later as a re- 
sponse on the recall trial, it would be classi- 
fied as a seen error. Such errors are difficult 
to evaluate by normative data since the re- 
cency of having seen the letter may act as a 
priming influence. The unseen errors—the 
letters given prior to their appearance on the 
recall trial—should be relatively free from 
this recency effect and the letter-association 
norms may be applicable. 

The evaluation of the unseen errors by the 
norms shows that the general level of error 
strengths is appreciably higher for HA list 
than for LA list. Combining all degrees of 
learning, the results show that after 1 day 
LA list produced errors of which 41% had 
values of 6 or more in the association tables, 
with 44% after 7 days. For HA list the per- 
centage of errors with a value of 6 or more 
was 78% at 1 day and 85% at 7 days. Thus, 
the strengths of the error tendencies for each 
list (as judged from the association tables) 
have some level of appropriateness for the 
list which produced them as far as the correct 
associations are concerned. While the strength 
of the correct associations after learning is 
not known relative to the incorrect associa- 
tions as obtained from the norms, the error 
strengths for LA list are lower, in general, 
than those for HA list. Thus, a subject does 
not respond at random; the strengths of the 
overt errors for the two lists differ in the 
direction of being appropriate for the initial 
strength of the correct associations. This 
observation should not be interpreted to mean 
that very strong interfering associations which 
did not produce overt errors are also irrelevant 
to the interference at recall. Many transfer 
studies using the A-B, A-C paradigm have 
shown heavy interference with but few overt 
errors. It is as if a subject does not know what 
the correct response term is (because of inter- 
ference) but he knows that the interfering item 
is incorrect. This can only mean that overt 


errors have limited usefulness for analyzing 
interference effects. 

The second point relative to the overt er- 
rors at recall concerns their type. In nearly 
every case where a given error occurred with 
high frequency, i.e., was given by many dif- 
ferent subjects, an alphabetical sequence ha- 
bit could be identified. Several examples will 
be given. Among the subjects learning LA 
list, 14 gave Y as a response to X after 1 
day and 27 after 7 days. The Letter V was 
given to the Stimulus U by 6 subjects at 1 
day and by 13 subjects after 7 days. In HA 
list Y was given to Z by 16 subjects after 1 
day and 40 subjects after 7 days. It will be 
noted that this error is not a direct alphabeti- 
cal association. However, the correct response 
to Z was X and it may be suspected that the 
learning took place by the use of Y as a 
mediator. Furthermore, inspection of the as- 
sociation tables show that these three letters 
(X, Y, Z) elicit each other with high fre- 
quency. In HA list also, N occurred to M, 
T to S, and S to R, all with fairly high fre- 
quency. Of the total of 538 within-list errors 
(R errors and S errors) observed at recall, 
223 (44%) were letters directly adjacent to 
or one step removed in alphabetical sequence 
from the stimulus term. Of the 67 imported 
letters at recall 57% were of the alphabetical 
type. 

It thus appears that alphabetical sequence 
habits are of considerable importance in pro- 
ducing the errors observed in the present 
experiment. While it was known from the 
association tables that alphabetical sequence 
habits represent a major component in letter 
associations, there was no expectation that 
they would be so dominant in the present 
error data. 


Discussion 


The letter-sequence hypothesis states that 
the lower the initial associative connection 
between letters the greater will be the forget- 
ting. Lower initial association implies greater 
interference during learning and subsequently 
greater interference at recall following re- 
covery of the extinguished erroneous associa- 
tions. The comparison of the recall of HA 
list with LA list provided a test of this hy- 


14 UNDERWOOD AND KEPPEL 


pothesis. The hypothesis received no support 
since the recall of the two lists was essentially 
equal with equivalent degrees of learning. 

Is the test of the hypothesis an adequate 
one? Perhaps the differences in rate of learn- 
ing the lists might have been due to differ- 
ences other than the intended differences in 
initial associative strength between the letters 
to be learned. Certain possibilities may be 
examined. For example, the response letters 
of LA list are of generally lower frequency 
than those in HA list. Could this have pro- 
duced the observed difference in rate of learn- 
ing the two lists? In a study reported briefly 
elsewhere (Underwood & Schulz, 1961), two 
paired-associate lists were used, one of which 
had the eight highest frequency letters as re- 
sponse terms, the other having the eight with 
the lowest frequency. Although the learning 
rate favored the list of letters with high fre- 
quency, the difference was far from significant 
statistically. There is no reason to believe 
that any factor except the differences in initial 
associative strength were responsible for the 
observed differences in learning the two lists. 
The letter-association norms for the two lists 
indicated greater potential interference for 
LA list, and this is a basic requirement if the 
hypothesis were to be tested adequately. It 
is possible that with greater differences in 
initial associative strengths between letters in 
the two lists differences in retention might 
emerge. The HA list was not “high” in an 
absolute sense. However, it was as high as 
allowed by the letter-association tables if at 
the same time (a) no letters were repeated, 
and (b) a sufficient number of pairs was used 
to require several trials to learn. Repeated 
letters within a list adds a complicating fac- 
tor which seemed wise to avoid in the inter- 
ests of a “pure” test. Nevertheless, support 
for the hypothesis might be obtained if wider 
differences in initial associative strength were 
used and number of repeated letters within 
the two lists equalized. From the present 
data, however, it must be concluded that no 
support was found for the letter-sequence 
hypothesis. 

Just where the theorizing leading to the 
letter-sequence hypothesis went astray is 
difficult to determine. It was derived from 
the general interference theory and the evi- 


dence of the present experiment gives strong 
support to such a general theory. That is, 
the results (error recovery, degree-of-learning 
effects) support the notions that an extinc- 
tionlike process occurs followed by a re- 
covery of the extinguished associations with 
time to produce interference, hence forgetting, 
at recall. The fact is that with equivalent de- 
grees of learning for the two lists the amount 
of interference impinging at recall appears 
equivalent, This might mean, therefore, that 
with a given degree of learning all potential 
interfering tendencies are equally well ex- 
tinguished, and with equivalent recovery rates 
for the error tendencies in both lists the recall 
will be equivalent. The difficulty with such 
a position is that there were presumably more 
potential interfering tendencies for LA list 
than for HA list (see Method); therefore, 
with equal recovery rates more interference 
should have occurred for LA list. To pursue 
this line of explanation requires some equali- 
zation of interference as a consequence of 
differential recovery rates, different numbers 
of interfering associations, or different levels 
of recovery, or some combination of these 
factors. A judicious weighting of these fac- 
tors would allow a prediction of equivalent 
interference at recall for the two lists, but 
this is a distasteful procedure without inde- 
pendent evidence for the factors involved. 

However deficient the letter-sequence hy- 
pothesis has been found to be, it may be noted 
that the present results are quite in line with 
the findings that factors such as meaningful- 
ness and intralist similarity, while producing 
wide differences in rate of learning, do not 
influence retention as measured by recall. 
Presumably, therefore, all such results might 
yield to the same theoretical accounting. With- 
out presuming to suggest such an account, 
certain considerations may be advanced in 
search of one. 

1. Overt-error data may be misleading. 
Overt errors at recall may only be sympto- 
matic of the fact that forgetting has occurred; 
they may not be indicative of the processes 
which caused the forgetting. Other error 
data, derived from a situation in which an 
extinction process seemed clearly to be oc- 
curring, also suggested the fact that errors 


————— Lk Ll 


RETENTION AND LETTER-SEQUENCE INTERFERENCE 15 


may be misleading (Underwood, Keppel, & 
Schulz, 1962). 

2. The extinction process, said to be oc- 
curring in verbal learning, may not operate 
on specific habits as such but on more gen- 
eral or second-order habits which control the 
specific habits. For example, in the present 
study, where alphabetical sequence habits 
could be inferred from the errors, the extinc- 
tion which occurs may not be of the specific 
habits (e.g., saying Y to X) but of the habit 
of giving alphabetical sequence responses of 
any kind. Certain evidence in the present 
experiment might be used to support this no- 
tion. An analysis was made of the errors at 
recall to see if they had occurred in the sub- 
ject’s learning record to the same stimulus. 
It was discovered that at the 1-day recall 
36% of the errors had occurred in learning 
LA list and 34% in learning HA list. How- 
ever, on the 7-day recall the values were re- 
duced to 29% and 27%, respectively, in spite 
of the fact that many more errors were given 
at 7 than at 1 day. If specific interfering 
habits were recovering it might be expected 
that a greater percentage of the errors at the 
7-day recall than at the 1-day recall would 
have occurred during learning. If generalized 
habits are involved, the lack of differences in 
forgetting the two lists might be said to be 
due to the fact that the generalized or second- 
order habits, whatever they are, will interfere 
equally with the recall of the two lists. 

3. Another conclusion to which certain 
evidence points is that “true” extinction of 
associations per se does not occur in verbal 
learning (Underwood et al., 1962). Rather, 
only verbal units from a list are truly ex- 
tinguished, these being thought of as responses 
associated to the experimental context. Even 
so, this extinction will not occur unless the 
units are no longer appropriate to the situa- 
tion, Thus, true extinction would occur of the 
B terms during learning of A-C in an A-B, 
A-C transfer situation. When interference is 
produced by other associations from within 
the list, the response terms of these associa- 
tions cannot be extinguished since they are 
appropriate responses to other stimuli in the 
list. Just how these erroneous within-list 
associations are reduced or weakened or dis- 
criminated so that they no longer interfere 


in learning is not clear, but the term “sup- 
pression” may be a more appropriate descrip- 
tion than is extinction. Nevertheless, insofar 
as these suppressed associations may produce 
interference at recall, there must be a failure 
of suppression which is just another way of 
speaking of some sort of a recovery process. 
And again, such a mechanism would seem 
to predict better recall of HA list than of LA 
list in the present experiment. 

Certain changes in the detailed mechanisms 
of interference theory appear needed. Perhaps 
the above suggestions will be relevant to these 
changes, However, it should be clear that 
no final accounting has been attempted of 
the equal recall of the two lists observed in 
the present study. 

Absolute Amount of Forgetting. The rate 
of forgetting observed for the lists was rela- 
tively slow. After 1 day, following only two 
learning trials, retention was approximately 
75% (forgetting 25%), and with high de- 
grees of learning only a very slight loss was 
evident. Even after 7 days, retention was 
38% following two learning trials. As noted 
in the introduction, interference theory might 
seem to predict very rapid forgetting of the 
paired letters. During the retention interval 
the subject would surely use, even by writing, 
letter sequences which are contrary to the 
correct associations as learned in the list. 
Thus, the correct association might be ex- 
tinguished by activities occurring outside of 
the laboratory during the retention interval. 
Now, while these activities may have occur- 
red, it is apparent that their occurrence did 
not produce heavy forgetting when viewed 
against the forgetting produced by a formal 
interfering laboratory task. The discrepancy 
between fact and theory is probably not seri- 
ous, A laboratory situation provides a differ- 
ent stimulus complex than the situation out- 
side the laboratory. The pairs of letters do 
not produce a true A-B, A-C paradigm when 
A-C is considered the interpolated task oc- 
curring outside the laboratory. The stimulus 
situations are in part different, and this may 
“protect” the laboratory associations from 
interference. Thus, just as retroactive inhi- 
bition can be reduced by learning the original 
and interpolated lists in two distinctly differ- 
ent laboratory environments (e.g., Bilodeau 


16 UNDERWOOD AND KEPPEL 


& Schlosberg, 1951), so too the retroactive 
inhibition of the laboratory associations by 
associations performed outside the laboratory 
may be minimized. 

Degree of Learning and Interference 
Theory. The data have shown that the higher 
the degree of learning of the associations, the 
slower the rate of forgetting. Such a result 
is fully in line with expectations of the gen- 
eral aspects of an interference theory. Not 
only does the strength of the correct asso- 
ciations grow stronger with continued repe- 
titions, but the higher the degree of learning 
the greater the extinction or suppression of 
potentially interfering associations. 

Speaking now in general of the reliable 
facts of forgetting, it appears that degree of 
learning is the only variable involved in a 
substantial way in retention. Indeed, there 
is good reason to believe that if degree of 


learning is equal there will be no individual 
differences in rate of forgetting. There is 
perhaps one minor exception to the general 
statement that degree of learning is the criti- 
cal factor governing rate of forgetting. With 
certain forms of interference, distributed 
practice will facilitate retention even though 
degree of learning by all available measures 
is equivalent following learning by massed 
and by distributed practice (Underwood et 
al., 1962). This fact can also be fitted into 
the general assumptions of an interference 
theory, with the distributed practice allowing 
a more thorough extinction of interfering as- 
sociations. Yet, the working out of the de- 
tailed mechanisms of extinction involve the 
same problems discussed earlier and which 
were posed by the failure to find a difference 
in the retention of the two lists in the present 
investigation. 


REFERENCES 


Barnes, J. M., & UNpERwoop, B. J. “Fate” of first- 
list associations in transfer theory. J. exp. Psychol., 
1959, 58, 97-105. 

Brtopeau, I. McD., & Scutosperc, H. Similarity in 
stimulating conditions as a variable in retroactive 
inhibition. J. exp. Psychol., 1951, 41, 199-204. 

Briccs, G. E. Acquisition, extinction, and recovery 
functions in retroactive inhibition. J. exp. Psychol., 
1954, 47, 285-293. 

Estes, W. K. Learning theory and the new “mental 
chemistry.” Psychol. Rev., 1960, 67, 207-223. 

Koppenaat, R. J., & O'Hara, G. N. The combined 
effects of retroaction and proaction. Canad. J. Psy- 
chol., 1962, 16, 96-105. 

KRUEGER, W. C. F. The effect of overlearning on 
retention. J. exp. Psychol., 1929, 12, 71-78. 

Posrman, L. The present status of interference the- 
ory. In C. N. Cofer (Ed.), Verbal learning and 
verbal behavior. New York: McGraw-Hill, 1961. 
Pp. 152-179. 

Unverwoop, B. J. Speed of learning and amount re- 
tained: A consideration of methodology. Psychol. 
Bull., 1954, 51, 276-282. 

Unverwoop, B. J. Interference and forgetting. Psy- 
chol. Rev., 1957, 64, 49-60. 


Unperwoop, B. J., & KEPPEL, G. An evaluation of 
two problems of method in the study of retention. 
Amer. J. Psychol., 1962, 75, 1-17. 

Unverwoop, B. J., KEPPEL, G., & Scmurz, R. W. 
Studies of distributed practice: XXII. Some con- 
ditions which enhance retention. J. exp. Psychol., 
1962, 64, 355-363. 

Unpverwoop, B. J., & Postman, L. Extraexperimental 
sources of interference in forgetting. Psychol. Rev., 
1960, 67, 73-95. 

Unverwoon, B. J., & Ricuarpson, J. The influence 
of meaningfulness, intralist similarity, and serial 
position on retention. J. exp. Psychol., 1956, 52, 
119-126. 

Unverwoop, B. J., & Scuurz, R. W. Meaningfulness 
and verbal learning. Chicago: Lippincott, 1960. 

Unverwoop, B. J, & Scmurz, R. W. Studies of dis- 
tributed practice: XXI. Effect of interference 
from language habits. J. exp. Psychol., 1961, 62, 
571-575. 

Worren, K. A. One-trial versus incremental-paired 
associate learning. J. verbal Learn. verbal Behav., 
1962, 1, 14-21. 


(Received September 15, 1962) 


Vol. 77, No. 5 Whole No. 568, 1963 


Psychological Monographs: General and Applied 


IMPRESSION FORMATION AS A FUNCTION OF 
ADJUSTMENT: 


ANTHONY J. MATKOM? 
* Northwestern University 


{ It was hypothesized that processes such as differentiation between Real and 
Apparent levels of personality, resistance to biasing effect of expectations, and 
feeling of certainty of one’s impressions of others, rather than “accuracy” of 
perception, contribute most significantly to difference between well-adjusted and 
maladjusted behavior. 60 maladjusted and 60 well-adjusted freshmen viewed 
a person on screen after receiving favorable, unfavorable, or no information 
about him. Then, Ss rated the person in terms of what he “appeared” to 
be and what he “really was.” They also indicated how certain they were of 
their ratings. It was found that degree of differentiation between Real and 
Apparent personality and degree of certainty of rating are a function of S's 
adjustment. Effect of preinformation was found not to be a function. Most 


findings were significant (p < .001). 


ITHIN the framework of the “interper- 
\ \ sonal theory of personality formation,” 
distortions in the area of the perception of 
other people in our environment are an essen- 
tial consideration in any attempt to account for 
inappropriate forms of human behavior, since 
a great deal of our interpersonal behavior is 
a function of our perception of other people’s 
qualities and intentions. Misperception of 
these qualities and intentions is believed to 
lead to, and also to be the product of, malad- 
justment, For this reason, the study of the 
perception of other people’s qualities and in- 
tentions seems to provide a basic approach to 
the study of maladaptive behavior. 
Impression Formation. Such perceptual 
processes as the discrimination between what 
is apparent and what is real in others, the bi- 
asing effects of expectations, and the certainty 
of impressions lend themselves to experimental 
investigation. The study of these processes 
is the study of the process of perception itself, 
and belongs to the most recent trend in the 
study of interpersonal perception, usually re- 


1 The present research was conducted in partial ful- 
fillment of the requirements for the PhD degree un- 
der sponsorship of L. B. Sechrest at Northwestern 
University. Appreciation is also extended to R. I. 

. Watson, D. T- Campbell, and H. C; Quay, for their 
assistance. 

2Now at Mendota State Hospital, Madison, Wis- 


consin. 


ferred to as research in the field of impression 
formation, or person perception, For the pur- 
pose of the present study, impression forma- 
tion is seen as a product of two interacting 
processes—of the expectations one has of an- 
other person and of the “objective” perceptual 
information from that person. The product 
of the interaction of these two components is 
what is called the “impression of another per- 
son.” To be influenced by the component of 
expectation more than by the component of 
direct perception of the situation is to be 
fantasy dominated rather than reality ori- 
ented, and this is believed:to be more char- 
acteristic of maladjusted than of well-adjusted 
individuals. For this reason in the present 
study it is assumed that the process of im- 
pression formation of an individual is func- 
tionally related to the quality of his adjust- 
ment. 

Asch’s (1946) study is usually considered 
the first experimental attempt to investigate 
the process of impression formation. Asch’s 
procedure consisted of reading to groups of 
subjects a list of trait names said to denote 
characteristics belonging to a particular per- 
son, asking them to form an impression of this 
person. The lists were identical except that 
one list contained the adjective coup, the 
other contained the adjective warm. Asch 
found that subjects formed very clear impres- 


2 MATKOM 


sions of the person and that these impressions 
differed markedly for the two lists. This study 
has been particularly significant in the stimu- 
lation of theorizing and research in the area of 
impression formation. It has not, however, 
gone uncriticized. The sharpest criticism came 
from Luchins (1948) who writes, 

While we are in theoretical sympathy with the con- 
tent of Asch’s conclusions and implications, and while 
we believe that subsequent experimentation may 
attest to their validity, it is our opinion that they 
were drawn from experiments which had little rela- 
tionship to actual impressions of personality and 
that it was not clearly established how they logically 
follow the results. . . . Asch’s study . . . dealt es- 
sentially with the organization of words into a de- 
scription of a person [p. 321]. 


Luchins himself repeated one of Asch’s studies 
employing precisely the same instructions as 
Asch did; yet his results differed sharply from 
those Asch found. Mensh and Wishner (1947), 
and Kelley (1950), on the other hand, re- 
peated Asch’s experiment with different 
groups, and obtained essentially the same re- 
sults as those of Asch. 

To this writer it seems that Asch was not 
dealing with the entire process of impression 
formation, that he was investigating only the 
expectation component and did not deal with 
its perceptual aspect. It seems that Asch’s 
question to his subjects could, without produc- 
ing any change in results, be formulated as 
follows: What other traits do you consider 
consistent with the traits I have just men- 
tioned to you? Thus, Asch’s study remains 
only tangentially related to the present in- 
vestigation for two main reasons: (a) he did 
not study the impression formation as it is 
understood in this investigation, (6) he left 
the dimension of adjustment of his judges en- 
tirely out of the consideration. 

A great contribution to the research on im- 
pression formation has been made by Secord 
and his collaborators. In a series of studies 
they investigated the effect of facial features 
(physiognomy) on the process of impression 
formation of persons. They used photographs 
of human faces as the stimuli (Secord & 
Bevan, 1956; Secord & Muthard, 1955a, 
1955b). Secord and his colleagues are aware 
of the possibility that by using photographs 
they are taking the risk that the cues which 
are most important in interpersonal situations 


have been eliminated by the design. But, on 
the more positive side, in discussing several 
forms of perceptual distortions (such as para- 
taxis and categorization) and various types of 
motivational determinants (such as projec- 
tion) which seem to lead to disagreement 
among perceivers, Secord (1958) is prepar- 
ing grounds for the consideration of the varia- 
ble of adjustment in the study of impression 
formation, which is one of the main aspects 
of the present study. In this sense Secord’s 
work provides a more immediate background 
for this study than does the work of Asch. 
Other studies of person perception have 
approached the problem of impression forma- 
tion via the investigation of the “accuracy of 
perception” or the “predictive ability” of their 
human subjects, and the relationship of such 
capacities to adjustment. The results of such 
studies, however, have not been convincing. 
Only with hesitation and qualification could 
Taft (1955), after a review of the literature, 
conclude that adjustment is related to ac- 
curacy of person perception. Tagiuri (1958, 
p. 324), referring to the studies on the accu- 
racy of perception, said that these studies 
yield data that are inconsistent, difficult to in- 
terpret, and impossible to compare. He added, 
“Tt is the process rather than its achievement 
that one must investigate if a broad under- 
standing of the phenomenon is to be reached.” 
It is possible that the importance of accu- 
racy of perception for adequate personal ad- 
justment has been overestimated. Even if the 
interpersonal perception of well-adjusted indi- 
viduals differs from that of poorly adjusted 
persons, the question remains whether or not 
accuracy is the particular aspect of interper- 
sonal perception which is related to the differ- 
ences in adjustment. It might be, for example, 
that only a minimal degree of accuracy is in- 
dispensable in order to get along in this world, 
but that the large majority of people possess 
this degree of accuracy and that the variations 
in accuracy within the nonpsychiatric popula- 
tion may not be significantly related to the 
differences in adjustment within this popula- 
tion. In that event, qualities of interpersonal 
perception other than accuracy should be 
taken into consideration in an attempt to 
throw some light on the differences between 
well-adjusted and poorly adjusted behavior. 


f 


IMPRESSION FORMATION AND ADJUSTMENT 3 


Such reasoning suggests an alternative to the 
investigation of accuracy as an approach to 
the study of the interpersonal perception. Such 
an approach would disregard accuracy and 
attempt to analyze the process of the impres- 
sion formation itself. This is the approach 
adopted in this study. This study is interested 
primarily in the investigation of the relation- 
ship between interpersonal perception and the 
mode of adjustment, through an analysis of 
the impression formation process. 

As a very general guiding framework, it was 
assumed that the following three variables of 
interpersonal perception are functionally re- 
lated to the differences between well-adjusted 
and maladjusted behavior: The degree of dif- 
ferentiation between the perception of Real 
and Apparent personalities, the biasing effect 
of expectations upon perception of others, and 
the feeling of certainty of one’s judgment 
about others. We turn next to a more detailed 
discussion of each of these hypotheses. 

Real and Apparent Levels of Personality 
Perception. Common sense suggests that indi- 
vidual differences in adjustment (or personal- 
ity integration) are more likely to appear 
when a person has the opportunity to differ- 
entiate between what a person appears to be 
and what he is really like. If an individual 
presents an outwardly friendly approach, it 
seems likely that this approach will be per- 
ceived as friendly, and equally so, by most 
normal people regardless of their level of per- 
sonal adjustment. There may, however, be 
much less agreement in the matter of the in- 
tention or motivation behind such manifestly 
friendly behavior; that is, although we agree 
that the approach appears friendly, we may 
not agree as to just how friendly it really is. 

Our belief is that the difference between 
well-adjusted and maladjusted subjects is more 
apt to be expressed in their perception of the 
“hidden” (real or inferred) personality of the 
stimulus person than in their perception of 
the manifest form of the same person. If this 
reasoning is correct, it will be necessary to 
introduce the distinction between the Real 
personality and the Apparent personality per- 
ceived in impression formation, in order to 
understand the relationship between percep- 
tion and adjustment. The failure of previous 
studies to recognize this distinction or to con- 


trol for it may account in part for the unsatis- 
factory amount of conclusive knowledge in 
this area: if some judges in an experiment 
are rating their subjects in terms of their Real 
personality, and others in terms of their Ap- 
parent personality, and if still others tend to 
shift from one position to the other during 
their rating, then the results can hardly be 
expected to be predictable. 

The differentiation between Apparent per- 
sonality and Real personality develops in nor- 
mal everyday living. In our culture, striving 
to impress others favorably, and to bias them 
about one’s self, is not only considered accept- 
able but is encouraged and rewarded. There- 
fore, to make this distinction in the interpre- 
tation of the behavior of others, and to accept 
it, is a sign of maturity and of a realistic out- 
look on life. On the other hand, a failure to 
recognize this distinction, a minimization of 
its importance or, for that matter, extreme 
exaggeration of its significance may indicate 
an immature level of social development. 

In this experiment, the subjects made rat- 
ings both of what a person “appears to be” and 
what he “really is.” We shall refer to the dif- 
ference between these ratings as a discrepancy 
measure or discrepancy score. Within the “nor- 
mal” population our guess is that the discrep- 
ancy score should increase with increasing 
levels of maladjustment. Beyond the normal 
range, however, there are probably individuals 
whose differentiation between what the rated 
person “appears” to be and what he “really” 
is has become hazy and uncertain. At this 
level we might expect to find that what the 
perceived person appears to be would begin 
to lose its stability and to move closer to what 
the individual believes the other person really 
is. This would be the range of disturbance 
where abnormality is a better descriptive term 
than maladjustment. Within this range one 
would not be surprised if a genuinely and 
manifestly friendly approach would be per- 
ceived as manifestly (not only inferentially) 
hostile. In this study, however, all the subjects 
belong to the normal population, and it is 
expected that they will employ the difference 
between the Real and the Apparent levels of 
personality in the rating of the stimulus per- 
son. Moreover, it is expected that increasing 
maladjustment will be accompanied by an 


4 MATKOM 


increase in the magnitude of the discrepancy 
score. This is the first hypothesis tested in 
this study. 

Biasing Effect of Expectations. It is as- 
sumed that, when making judgments about 
their physical or human environments, people 
always depend upon their expectations as well 
as upon items of more direct perceptual infor- 
mation. It seems likely that the difference be- 
tween poorly adjusted and well-adjusted indi- 
viduals is related to the balance or relative 
reliance placed upon these two sets of factors. 
Thus, good adjustment may be characterized 
by the ability of the individual to direct him- 
self more by objective perceptual facts than by 
expectations he may have about his human en- 
vironment, when making judgments about it. 
In line with this type of reasoning it is hy- 
pothesized that with the maladjusted subjects, 
because of their tendency to go more by their 
expectations than by direct perceptual experi- 
ence, items of previous information will have a 
greater effect on their rating of the stimulus 
person than will be the case with the well-ad- 
justed subjects. This is the second hypothesis 
tested in this study. 

Certainty of Impressions. Dailey (1952) 
found that the tendency to arrive at quick 
judgments makes new data harder to assimi- 
late than when the observer withholds judg- 
ment until all the evidence is in. The readi- 
ness of an individual to form a judgment and 
act on it in spite of insufficient evidence is 
likely to be very closely related to the variable 
of adjustment, apart from the accuracy of the 
judgment. If two people are equally inaccurate 
in their first impression of the third person, 
then the one who is more certain of his im- 
pression is likely to make more mistakes in 
his dealings with the third person. The one 
who hesitates to make a definite judgment is 
more likely to search for further evidence and 
to correct some of his initial misperceptions. 
To the extent that adequate dealings with 
other persons are a mark of adjustment, and 
to the extent that such adequacy requires re- 
serving judgment with respect to others, we 
may expect well-adjusted subjects to show less 
certainty in their judgments of others than the 
poorly adjusted judges. This is the third hy- 
pothesis tested in this study. 

The general hypotheses outlined above have 


several aspects. Spelled out in more detail 
than was done above, these hypotheses are as 
follows: 


1. The discrepancy between subjects’ rat- 
ings of the Real and the Apparent aspects of 
the judged person will be significantly greater 
in the maladjusted than in the well-adjusted 
subjects. 

2. All, but particularly poorly adjusted sub- 
jects, will rate the Real person more unfavor- 
ably than the Apparent person. 

3. In their ratings of the Apparent aspect 
of the judged person, the difference between 
well-adjusted and maladjusted subjects will be 
smaller than the difference in their ratings of 
the Real aspect of the judged person, 

4. In their ratings of the Real aspect of 
the stimulus person, the maladjusted subjects 
will be more influenced in the expected direc- 
tion, by the preinformation, than the well- 
adjusted subjects. 

5. The preinformation will have a significant 
effect on the subjects’ ratings of the stimulus 
person, particularly so on their ratings of the 
Real aspect of the stimulus person, regardless 
of the adjustment of the subjects. 

6. Well-adjusted subjects will be less cer- 
tain of their judgments (ratings) than the 
poorly adjusted subjects. 


METHOD 


In order to test the predictions just described, an 
experiment in person perception was performed in- 
volving a design intended to provide for a manipula- 
tion of the subjects’ adjustment and expectations. The 
subjects who differed in level of adjustment were 
selected in terms of their responses to the Rorschach 
test. Subgroups of each of these major sets of sub- 
jects were then provided with positive, negative, or 
no information about a person whom they were to 
evaluate. The effects of these variables, which com- 
bine to form a 2x3 factorial design, were assessed 
in the subjects’ performance on a set of rating scales 
which reflected their judgments of the personalities 
of other persons. 


Subjects 


The subjects in this study were 120 freshmen from 
an introductory psychology class at Northwestern 
University. They were assigned, in equal numbers, 
to six groups (positive, negative, or no preinforma- 
tion presented to well-adjusted and maladjusted 
groups of subjects) in the following way. As he 
appeared, each subject was assigned to the group 


IMPRESSION FORMATION AND ADJUSTMENT 5 


within his adjustment classification to which neither 
of the two preceding subjects had been assigned. 
It was expected that this method should provide for 
a simultaneous and nearly even building up of all 
groups, including an equal number of male and female 
subjects within each group. In practice, however, the 
method led to unexpected minor problems due to the 
unforeseen circumstance that it was considerably 
more difficult to find “well-adjusted” than “malad- 
justed” male subjects, and it was slightly easier to 
find well-adjusted than maladjusted female subjects. 
Thus, the method of assignment, while working satis- 
factorily within each of the separate adjustment clas- 
sifications, did not produce the same distribution of 
subjects between adjustment classifications. The 
female well-adjusted group grew in numbers slightly 
faster than the female maladjusted group, while the 
male maladjusted group grew considerably faster 
than did the male well-adjusted group. Because 
of this unexpected development, it was necessary to 
modify some of the minor details of the experi- 
mental design. Thus instead of having 10 male and 
10 female subjects, as originally planned, the study 
was completed with 14 female and 6 male subjects in 
each group. Certain details of procedure also had 
to be modified in ways to be described below. 

Each subject took part in two separate sessions, 
which usually took place several days apart. The 
first session consisted of a group Rorschach test and 
a rating of the Average American. The group Ror- 
schach test, with colored slides, was administered 
to introductory psychology students until a group 
of 120 subjects, 60 of whom could be classified as 
maladjusted and 60 of whom could be classified as 
well-adjusted, were secured. 

The group Rorschach test was introduced to the 
subjects with the following explanation: 


The test in which you will participate now is 
the inkblot test. All you have to do is to look at 
the inkblots which will be projected on the screen 
and write down what you see. 

Probably all of you, at one time or another, have 
shaken your pen on a piece of paper, made a blot 
of ink, folded the paper, and squeezed it into a 
design which may or may not have resembled 
something that you recognized. That is the way 
the inkblots that you will see on the screen were 
formed. 

Your task is simply to write down what these 
blots remind you of, resemble, or might be. You 
will see each of these slides or blots for 2 minutes 
and you may write your answers during that time. 

There are no right or wrong responses. You can 
write as much as you can in 2 minutes, and you 
can give as many responses as you like—the more 
the better. Is that Understood? 


One out of every three subjects who volunteered 
for this experiment was excluded from the experiment 
because his Rorschach protocol did not provide 
sufficient evidence for assigning him to either of the 
two major adjustment classifications. The remain- 
ing subjects were divided into subgroups of well- 
adjusted and maladjusted subjects by the experi- 


menter who employed his clinical impression and 
relied heavily upon content categories in making his 
judgment. Table 1 shows the frequencies with which 
certain of the more highly differentiating categories 
were employed. The reliability of the rating pro- 
cedure was established by having two other judges 
perform the same classification. The degree of agree- 
ment among judges was assessed by means of the 
phi coefficient. Correlations estimated in this way 
were .60 between the one judge and the experi- 
menter and .54 between the other judge and the 
experimenter. The correlation between the ratings 
of the two judges was .75. 

For the purpose of this investigation, the two 
major groups will be referred to as well-adjusted (W) 
and maladjusted (M) groups. The use of these terms 
is reasonable for the following reasons: (a) In terms 
of the typical clinical interpretation, it is believed 
that “poor” Rorschachs correlate with what is 
clinically understood as maladjusted behavior, and 
“good” Rorschachs correlate with its opposite, well- 
adjusted behavior. (b) There was a modest correla- 
tion between the quality of the Rorschach protocols 
and the quality of general emotional and social ad- 
justment as reflected by the subjects’ self-ratings on 
an Adjustment Scale. The Adjustment Scale was a 
6-point scale on which each subject was asked to 
rate himself in terms of his own emotional and social 
adjustment. This scale was presented to each subject 
at the end of his participation in the experiment. The 
point-biserial coefficient of correlation between the 
subjects’ self-ratings and the experimenter’s judgment 
of the quality of their Rorschachs as good or poor 
was .29 (p< .01). Although this correlation is not 
high in absolute terms, it becomes somewhat more 
impressive when it is understood that this study did 
not deal with the extremes of adjustment, but rather 
levels of adjustment which, from a psychiatric point 
of view, would all fall within the normal range. 

During the first session, the subjects were also 


TABLE 1 
FREQUENCIES OF SOME CONTENT CATEGORIES IN 30 


WELL-ApyUSTED AND 30 MALADJUSTED 
RORSCHACH PROTOCOLS 


Mal- Well- 

Content categories adjusted adjusted 

subjects subjects 
Blood 9 0 
Anatomy 38 13 
Unpleasant scene* 41 8 
Destruction, disintegration 11 1 
Dirt, mess, uncleanliness 6 0 
Ominous sign” 14 1 
Fighting 25 8 
Distorted figures 14 7 
Mishap, accident 6 2 

*« Examples: ‘ 


ld man lying dead on the sidewalk”; “Down- 
hearted butterfly”; “A smashed crayfish.” oh 

b Examples: “Will die because supports are splitting down the 
middie”; “Gods of death watching the Earth.” 


6 Marxom 


asked to rate the Average American on two identi- 
cal sets of trait scales. The rating sheets consist of 
two identical sets of 18 6-point (see Table 11) trait 
scales, All ratings of all subjects were done on iden- 
tical rating sheets. Nine of the 18 traits (standard 
traits) were not used for preinformation purposes 
in any case. This provides for a core of traits, not 
experimentally manipulated and the same for all 
subjects. The other 9 traits (variable traits) were 
used as a pool out of which sets of traits (3 in a set), 
were randomly drawn, for the purpose of forming 
preinformation sets. The decision to use trait scales 
instead of trait opposites as has been done in most 
Aschtype experiments was based on the belief that 
the difference in perception among people is more 
likely to be in terms of the degree of, say, “friendli- 
ness” or degree of “unfriendliness” rather than in 
terms of the opposites, friendliness and unfriendliness, 
In the same way a subject is not expected to perceive 
an “unfriendly” stimulus person as “friendly” just 
because he was given a certain kind of preinforma- 
tion; more realistically, as a result of having received 
“positive” preinformation, it is expected that he will 
perceive the unfriendly stimulus person as somewhat 
less unfriendly. The method of trait opposites is not 
suitable for such purposes because it tends to distort 
the subjects’ ratings in two directions: it either 
eliminates the subtle differences or exaggerates them 
out of proportion. Instructions for this rating were 
given as follows: 


On the rating sheet in front of you, you can see 
two identical columns of 18 trait scales. Above the 
left column it says “what the person appears to 
be,” and above the right column it says “what the 
person really is.” 

The distinction between what a person appears 
to be and what a person really is is something we 
do continuously in everyday life. We know that 
people are not always what they appear to be. The 
better we know a person the better we can tell 
what that person really is, in contrast to what he 
appears to be. 

On this sheet I would like you to rate the 
Average American as you conceived him on all of 
the 18 traits and in both ways, in terms of what 
he appears to be and in terms of what you think 
he really is, That is why there are two columns 
of traits, one for you to rate the Average Ameri- 
can in terms of what he appears to be and the 
other to rate him in terms of what you think he 
really is. 

Each scale describes the two opposite sides of a 
trait, and consists of six spaces. You put a check 
mark in that space on each scale that best de- 
scribes your subject, who is the Average American. 

Men will rate the Average American man, and 
women will rate the Average American woman. 
Rate your subject on all the 18 traits on your rat- 
ing sheet. Do not proceed down the left column 
first, then down the right column, but rather go 
across the page. Rate your subject on the trait 1 
in terms of what he appears to be and then rate 


him on the same trait in terms of what he really is, 
before you go to the next trait N.2. 


Only a few subjects asked for further explanation 
as to what is meant by the Average American or 
voiced confusion as to what was meant by Apparent 
and Real aspects of another person. 

There were several reasons for having the subjects 
rate the Average American. One of the reasons was 
the familiarization of the subjects with the construc- 
tion of the rating sheet, and practice in differentiating 
between Apparent and Real personalities. Also, it was 
desirable to have at least one procedure in which all 
subjects in the experiment took part under identical 
conditions. Thus, the rating of the Average American 
serves as a check on whether the different groups 
were comparable at the beginning of the experiment. 
In addition, it was speculated that when describing 
another person, the subjects may be using their own 
idea of the Average American as an anchorage or 
reference point for their judgments. If this were the 
case, the rating of the Average American could be 
used as a reference point in estimating the biasing 
effect of preinformation on different groups of sub- 
jects. Finally, the rating of the Average American 
was seen as an economical method for providing an 
experimental double check on some of the phenomena 
under investigation in this study. 

During the second session of the experiment the 
subjects were treated individually. Individual treat- 
ment was necessary at this stage because of the intro- 
duction of the differential treatment among different 
groups. 

For the four groups, which were given the pre- 
information stimulus (ie. the well-adjusted group - 
with positive preinformation, WPP; the well-adjusted 
group with negative preinformation, WNP; the mal- 
adjusted group with positive preinformation, MPP; 
and the maladjusted group with negative preinforma- 
tion MNP), the first part of the second session con- 
sisted of rating the “unknown person.” This pro- 
cedure was introduced by the experimenter telling the 
subject: 


I am going to describe for you a person in 
terms of how most people who know this person 
well see him [or her]. Some of the ways in which 
most of these people agree about him is that he is 
. . - [three traits]. This, of course, is a very in- 
complete description of this person, but just the 
same I would like you to describe this person in 
terms of these 18 traits on your rating sheets, as 
you would imagine him on the basis of the infor- 
mation I gave you. 

At the same time I would like you to indicate 
by numbers from 1 to 4 how certain you are of 
any one of your ratings. N.1 stands for very un- 
certain, N.4 stands for very certain; you can use 
N.2 and N3 for in-between cases. Again, N.1 
stands for very uncertain, N.4 stands for very 
certain. 


We shall use the term “preinformation” to refer to 
sets of three trait names which were presented 
verbally to the subject at this stage of the experi- 


IMPRESSION FORMATION AND ADJUSTMENT 7 


ment as a partial description of the person the sub- 
ject was to view on the screen. Each preinformation 
set was drawn randomly from a common pool of nine 
trait names (INDUSTRIOUS, SCRUPULOUS, HUMANE, 
SERIOUS, INTELLIGENT, PRACTICAL, POLITE, THOUGHT- 
FUL, and warm) and their opposites. Three types of 
preinformation (positive, negative, and no preinfor- 
mation) were used with equal numbers of adjusted 
and maladjusted groups of subjects. Positive pre- 
information was made up of selections of the adjec- 
tives as listed above (e.g., INDUSTRIOUS, INTELLIGENT, 
and POLITE) while the opposite (socially less desir- 
able) sides of these dimensions were used for nega- 
tive preinformation (e.g., LAZY, NOT VERY INTELLI- 
GENT, and BLUNT). In the “no preinformation” 
condition, preinformation was omitted. Together 
with the adjustment categories, these types of pre- 
information generated six experimental conditions: 
MPP; MNP; maladjusted, no _ preinformation 
(MnP); WPP; WNP; and well-adjusted, no pre- 
information (WnP). 

When rating the unknown person, the subjects 
were not requested to differentiate between the 
Apparent and the Real person; consequently, only 
one set of 18 traits was utilized for this operation. 

The major reason for including the rating of the 
unknown person was to create a positive or a nega- 
tive set in the subject and to allow him to dwell 
upon it for a few minutes, while rating the unknown 
person. 

The certainty measurement was introduced at this 
point for two reasons: (a) to provide practice for 
the subject before he was asked to rate the stimulus 
person, and (b) to provide for him a way with 


* which to express any hesitation he might have about 


rating a person unknown to him, 
Immediately after the rating of the unknown per- 
son the subject was told: 


Now I am going to show you on the screen, the 
very same person that I was describing to you a 
few minutes ago, 


The subjects were then shown four 6-minute long, 
colored, sound motion pictures of an interview situa- 
tion. On two of the films the interviewees were 
men, and on the other two films they were women. 
Male subjects viewed a man; female subjects viewed 
a woman, Originally it was planned that each of the 
four films would be viewed by five subjects in each 
of the six groups. Primarily because of the difficulty 
in obtaining the well-adjusted male subjects, the 
numbers of male subjects who witnessed each viewing 
were 5, 3, 3, and 1. This distribution, however, was 
maintained across all of the six groups. 

The films were produced by Cline and Richards 
for their studies of the accuracy of interpersonal 
perception, and have been described in detail else- 
where (Cline & Richards, 1960). In brief, however, 
the interviewer probed the following areas: (a) 
personal values, (b) hobbies and activities, (c) self- 
conception, and (d) temper. For this study the ex- 
perimenter selected film interviews in which the 
interviewees were young adults. 


After the film presentation, the subject was given 
the following instructions: 


Now I want you to describe this person in both 
ways, in terms of what he appears to be, and in 
terms of what you think he really is. Base your 
judgment only on what you saw on the screen, 
Also indicate how certain you are of any one of 
your ratings, separately for the left and for the 
right columns on your rating sheet. Go only by 
what you saw on the screen, Go only by what 
you saw on the screen. 


The last statement which requested the subject 
to base his judgment only on what he saw on the 
screen was repeated and emphasized, because it was 
believed that in this way only the involuntary effect 
of the preinformation stimulus upon the subject's 
perception of the stimulus person would be reflected 
in his rating, while the conscious and direct utiliza- 
tion of the preinformation would be reduced to a 
minimum. 

The procedure for Groups MnP and WnP was 
somewhat different, in that they were not asked to 
rate an unknown person and were not given any 
preinformation, Instead they were asked to repeat 
the rating of the Average American. This was fol- 
lowed by the film presentation and the subsequent 
rating of the stimulus person. With these two groups 
there was no attempt on the part of the experimenter 
to influence their perceptions on their judgments of the 
stimulus person by giving them any kind of sug- 
gestions. These two groups can be thought of as 
control groups. 

Differential treatments for various groups in this 
experiment are summarized in Table 2, 


RESULTS 


This study is primarily concerned with the 
differences between well-adjusted and mal- 
adjusted subjects in ratings of personality. 
These differences are expected to manifest 
themselves in, (a) the subjects’ differentiations 
between Apparent and Real personality, (b) 
the subjects’ responses to the assumed biasing 
effect of the preinformation stimulus, and (c) 
the subjects’ own assessment of the feeling 
of certainty of their rating of the stimulus per- 
son, Predictions in this study were tested in 
terms of three types of scores derived from 
the procedure just described. 

Favorableness Score. The favorableness 
score for each subject was based on the 
“absolute” value of his ratings on each of 18 
traits. Six spaces on a standard rating scale 
were assigned a value from 1 to 6; 1 for the 


MATKOM 


TABLE 2 


SUMMARY OF DIFFERENTIAL TREATMENTS FOR ALL GROUPS 


First session taken in groups 


Group of 10 subjects Second session taken individually 
WPP Rating of Group Ror- | Rating of unknown Viewing of Rating of 
Average American schach test person ; positive stimulus person | stimulus person 
(7 minutes) (20 minutes) preinformation (6 minutes) (10 minutes) 
(5 minutes) 
WNP Rating of Group Ror- | Rating of unknown Viewing of Rating of 
Average American schach test person ; negative stimulus person | stimulus person 
(7 minutes) (20 minutes) preinformation (6 minutes) (10 minutes) 
(5 minutes) 
WnP Rating of Group Ror- Rating of Average Viewing of Rating of 
Average American schach test American repeated | stimulus person | stimulus person 
(7 minutes) (20 minutes) (5 minutes) (6 minutes) (10 minutes) 
MPP Rating of Group Ror- | Rating of unknown Viewing of Rating of 
Average American schach test person; positive stimulus person | stimulus person 
(7 minutes) (20 minutes) preinformation (6 minutes) (10 minutes) 
(5 minutes) 
MNP Rating of Group Ror- | Rating of unknown Viewing of Rating of 
Average American schach test person; negative | stimulus person | stimulus person 
(7 minutes) (20 minutes) preinformation (6 minutes) (10 minutes) 
(5 minutes) 
MnP Rating of Group Ror- Rating of Average Viewing of Rating of 
Average American schach test American repeated | stimulus person | stimulus person 
(7 minutes) (20 minutes) (5 minutes) (6 minutes) (10 minutes) 


most unfavorable rating, and 6 for the most 
favorable rating. The favorableness score is a 
simple sum of these absolute values of the 
subject’s ratings on all 18 traits. 
Discrepancy Score. The discrepancy score 
for each subject was derived from the differ- 
ences of the subject’s rating of the Apparent 
aspect of the judged person and his rating of 
the Real aspect of the judged person, trait by 
trait. The discrepancy score is a relative score; 
it is independent of the absolute position on 
the scale of the two sets of ratings; it is based 
on the relative position of the two sets of rat- 
ings. The discrepancy score for any one sub- 
ject is the arithmetic sum of the Apparent- 
Real differences in his ratings on a series of 
18 traits, This means that the discrepancy 
score is independent of the direction of the 


Apparent 
RUDE LY ALBEE EPONE 
stupp / / / / /x/ / WISE 
srono Z /x/ / / / / WEAK 
SLOW E ES QUICK 


difference in rating between Apparent and 
Real levels of personality. 

Any one subject can be given only one 
discrepancy score, because discrepancy score 
is an expression of the relative difference be- 
tween his rating of the Apparent and Real 
personality. However, every subject will have 
two favorableness scores, because the favor- 
ableness score is an expression of the absolute 
favorableness of the rating of the Apparent 
personality and that of the Real personality, 
independent of each other. 

To facilitate the conceptualization of the 
difference between the discrepancy and the 
favorableness scores, an illustration of a sub- 
ject’s hypothetical rating on four traits for 
both Apparent and Real personality will be 
presented: 


Real 
RUDE AISA SS ETA Y E POETE 
srupm |) y IE ey ea WISE 
STRONG... AA V Ea, WEAK 
SLOW E IE EG QUICK 


ee 


— e 


e 


IMPRESSION FORMATION AND ADJUSTMENT 9 


TABLE 3 


MEAN FAVORABLENESS Scores FOR ÄLL GROUPS 
DERIVED FROM THE RATING OF THE 
AVERAGE AMERICAN 


Apparent Real 
Group M SD M SD 
Well-adjusted 


Positive preinfor- 
mation 78.15 7.60 7440 8.14 


No preinformation 79.15 9.16 7210 7.51 
Negative preinfor- 


mation 76.50 10.13 70.30 8.93 
Maladjusted 

Positive preinfor- 

mation 73.50 10.51 63.10 10.25 


No preinformation 73.75 9.91 62.70 9.02 


Negative preinfor- 
mation 73.75 11.03 64.85 11.04 


The discrepancy score of this subject is 12; 
his favorableness score for the Apparent per- 
sonality is 14; his favorableness score for the 
Real personality is also 14. From this, one 
can see that the difference between a subject’s 
two favorableness scores is not equal to his 
discrepancy score. In this illustration the dif- 
ference between the two favorableness scores 
is 0 but the discrepancy score is 12. If each 
of the ratings in the above example were 
shifted one step to the left or to the right, the 
favorableness scores would change accordingly 
while the discrepancy score would remain un- 
changed, Thus we can say that favorableness 
measures and discrepancy measures are dif- 
ferent aspects of rating and are two at least 
conceptually independent measures. 

Certainty Score. The certainty score is con- 
ceptually independent of the favorableness 
and discrepancy scores, because it is based on 
a, different operation from other ratings in 
this study. It is not a judgment that the sub- 
ject makes about someone else but is rather 
a judgment that he makes about himself while 
judging others. The subject indicated his own 
feeling of certainty for every rating of the 
stimulus person he made, by numbers from 1 
to 4: 1 for “very uncertain” and 4 for “very 
certain.” His total certainty score was the sum 
of the scores thus assigned. Each subject re- 
ceived a separate certainty score for the Ap- 
parent and for the Real levels of personality. 


The comparability of the groups in this ex- 
periment within each of the two adjustment 
classifications was checked by comparing the 
mean favorableness scores of their ratings of 
the Average American. Up to that point in 
the experiment, no differential treatment of 
groups had been introduced. Table 3 presents 
the mean favorableness scores for all groups 
derived from the ratings of the Average Ameri- 
can. 

Inspection of Table 3 shows that the com- 
parability of the maladjusted groups is almost 
ideal. It also shows that the variation among 
means of the well-adjusted groups is small, i.e., 
within the range of 3 points on the Apparent 
level and within the range of 4 points on the 
Real level of personality. 

Hypothesis 1 which predicts that the dis- 
crepancy between the subjects’ rating of the 
Real and Apparent aspect of the judged person 
will be significantly greater in maladjusted 
than in well-adjusted groups has been sub- 
stantiated in two ways: (a) By a trend analy- 
sis of variance of discrepancy scores for all 
subjects, derived from their rating of the 
stimulus person. A summary of this analysis is 
presented in Table 4. The significant (p< 
.001) adjustment effect shows that the means 
for well-adjusted and maladjusted groups 
averaged over the kinds of preinformation 
differ significantly. Inspection of means (Table 
5) shows that the discrepancy between the 
rating of the Apparent and the Real per- 
sonality is larger in the maladjusted than in 
the well-adjusted group; this is true with all 
three kinds of preinformation. (b) By an 
analysis of the discrepancy scores obtained 
from the same subjects but based on their 
rating of the Average American. The discrep- 
ancy score of the maladjusted group was found 


TABLE 4 
ANALYSIS OF VARIANCE oF DISCREPANCY SCORES 


DERIVED FROM THE RATINGS OF THE 
STIMULUS PERSON 


Source df MS F 
Adjustment 1 1968.30 26.51* 
Preinformation 2 944.36 12.72* 
Adjustment x Preinformation 2 41.49 
Within 114 74.23 

*p < 001. 


10 MATKOM 


TABLE 5 


MEAN DISCREPANCY SCORES DERIVED FROM THE 
RATING OF THE STIMULUS PERSON 


TABLE 7 


Mean FAVORABLENESS SCORES DERIVED FROM THE 
RATINGS OF THE STIMULUS PERSON 


Positive No Negative 
prein- prein- prein- 
formation formation formation 
Subjects M SD M SD M SD 
Well-adjusted 7.40 3.57 5.40 4.84 15.75 9.26 
Maladjusted 14.40 9.78 15.85 836 22.6 11.65 


to be significantly larger than the discrepancy 
score of the well-adjusted group (¢=6.00, 
df=119, p<.001). 

Hypothesis 2 which predicts that all sub- 
jects will rate the Real person less favorably 
than the Apparent person was supported by a 
trend analysis of favorableness scores of all 
subjects, based on both their rating of the 
stimulus person and by a trend analysis of 
favorableness scores based on their rating of 
the Average American. 

Table 6 presents a summary of the analysis 
of favorableness scores, derived from the rat- 
ings of the stimulus person. 

The significant (p<.001) effect (Table 6) 
for levels of personality indicates that the 
means for the rating of the Apparent person- 
ality and the rating of the Real personality, 
averaged over the levels of adjustment and 
levels of preinformation, differ significantly. 
Examination of means (Table 7) shows that 


TABLE 6 


TREND ANALYSIS OF VARIANCE OF FAVORABLENESS 
Scores DERIVED FROM THE RATINGS OF 
THE STIMULUS PERSON 


Source df MS F 
Adjustment 1 1627.40 8.93* 
Preinformation 2 2775.20 15.22** 
Adjustment x Preinformation 2 112.65 
Error (a) 114 182.35 
Levels of Personality 1 3031.6  56.24** 
Adjustment x Level of 

Personality 1 592.5 11,00* 
Preinformation x Level of 

Personality 2 287.8 5.35* 
Adjustment x Preinformation x 

Level of Personality 2 16.25 
Error (b) 114 53.90 

*p <01. 


** p < 001. 


Apparent Real 

Group M SD M SD 
Well-adjusted 
Positive preinfor- 

mation 88.95 9.94 87.45 8.42 
No preinformation 85.95 9.33 82.70 10.17 
Negative preinfor- 

mation 78.20 11.77 71.05 13.52 
Maladjusted 
Positive preinfor- 

mation 85.00 8.31 78.55 9.69 
No preinformation 82.00 13.18 73.20 10,32 
Negative preinfor- 

mation 79,90 8,22 64.40 11.95 


the Apparent personality was seen more favor- 
ably than the Real personality. 

A summary of the analysis of the favora- 
bleness scores obtained from the same sub- 
jects but based on their ratings of the Aver- 
age American is given in Table 8. 

The significant (p<.001) levels of person- 
ality effect (Table 8) and the examination of 
means lends further support to Hypothesis 2, 
in that it shows that the Apparent aspect of 
personality was seen by all subjects more 
favorably than the Real aspect of personality. 

The significant (p<.01) Adjustment x 
Levels of Personality interaction (Table 6) is 
a straightforward confirmation of Hypothesis 
3, which predicts that in the rating of the 
Apparent personality of the stimulus person 
the difference between well-adjusted subjects 
and maladjusted subjects will be smaller than 


TABLE 8 


TREND ANALYSIS OF VARIANCE OF FAVORABLENESS - 
Scores DERIVED FROM THE RATINGS OF THE 
AVERAGE AMERICAN 


Source df MS F 
Adjustment 1 2413.00  30.54* 
Error (a) 118 79.00 
Levels of Personality 1 3596.10  35.86* 
Adjustment x Levels of 

Personality 1 258.25 2.57 
Error (b) . 118 100.25 
*p < 001. 


IMPRESSION FORMATION AND ADJUSTMENT il 


the difference in the rating of the Real aspect 
of the stimulus person. This finding indicates 
that the adjustment effect is not independent 
of the levels of personality factor. In other 
words, the magnitude of the difference between 
well-adjusted and maladjusted groups is not 
the same for the rating of the Apparent per- 
sonality and the rating of the Real personality. 
Examination of means (Table 7) shows that 
the difference between maladjusted and well- 
adjusted groups is greater in the rating of the 
Real personality than in the rating of the 
Apparent personality of the stimulus person. 
Figure 1 presents the form of the Adjust- 
ment X Levels of Personality interaction. 

The significant (p<.01) adjustment effect 
in the analysis of favorableness scores, derived 
from the ratings of the stimulus person (Table 


6) indicates that the means for the well-ad- - 


justed and maladjusted groups averaged over 
kinds of preinformation and the levels of per- 
sonality differ significantly. Examination of 
means (Table 7) shows that maladjusted sub- 
jects rate the stimulus person less favorably 
than the well-adjusted subjects. The signifi- 
cant (p<.001) adjustment effectrin the analy- 
sis of the favorableness scores derived from 


uJ 
5 
o 85 
72) 
N 
5 
w 80 
3 
a 
$ 
a 75 
w 
z 
< 
Ww 
= 


7 
(a 
APPARENT REAL 
LEVELS OF PERSONALITY 


Fic. 1. Mean favorableness scores for well-adjusted 
and maladjusted subjects on the Apparent and the 
Real levels of personality. 


the ratings of the Average American (Table 8) 
and the examination of means strengthens the 
above finding that maladjusted subjects rate 
the judged person less favorably than the well- 
adjusted subjects. These findings although not 
explicitly formulated in a separate hypothesis, 
involve one of the basic assumptions of this 
study; on a more analytical level this assump- 
tion is implied in Hypotheses 2 and 3. Hy- 
pothesis 2 predicts that maladjusted subjects 
will rate the Real level of personality less fa- 
vorably than the well-adjusted subjects. Hy- 
pothesis 3 states that the difference between 
well-adjusted and maladjusted groups will be 
smaller on the Apparent level of personality, 
than on the Real level of personality. Together 
these two hypotheses predict that maladjusted 
subjects will rate the stimulus person less 
favorably than the well-adjusted subjects. 
This is precisely the meaning of the two above 
findings. Accordingly these findings support 
Hypotheses 2 and 3. 

Hypothesis 4 which expressed the expecta- 
tion that the maladjusted subjects will be 
more influenced by the preinformation stimu- 
lus than the well-adjusted subjects, has not 
been borne out by the findings of this study. 
The nonsignificant Adjustment x Preinforma- 
tion interaction (Table 6) indicates that the 
adjustment effect, i.e., the rating difference 
between well-adjusted and maladjusted 
groups, is independent of the direction of 
preinformation. This means that the differ- 
ence between maladjusted and well-adjusted 
groups is approximately the same regardless 
of the kinds of preinformation. 

The first part of Hypothesis 5 which 
predicts that preinformation will have a, sig- 
nificant effect on the subjects’ rating of the 
stimulus person, regardless of the adjustment 
of the subjects, has been substantiated by a 
significant (p<.001) preinformation effect in 
the analysis of variance of favorableness 
scores, derived from the ratings of the stimulus 
person (Table 6). This finding indicates that 
the means for positive preinformation, no pre- 
information, and negative preinformation, 
when averaged over the levels of adjustment 
and levels of personality, differ significantly. 

In order to determine more specifically 
which of the differences between the means 
(Table 7) in this analysis of variance are 


12 MATKOM 


significant, Duncan’s new multiple-range test 
was applied. This test showed that the dif- 
ference between the means of the negative pre- 
information and no preinformation groups is 
highly significant (p<.001), but the difference 
between the means of the positive preinforma- 
tion and no preinformation groups does not 
reach significance. Thus the prediction con- 
tained in the first part of Hypothesis 5 was 
found to apply to negative, but not to positive, 
preinformation. 

The second part of Hypothesis 5 which 
specifies that preinformation will have a par- 
ticularly significant effect on the rating of the 
Real aspect of the stimulus person was par- 
tially supported by a significant (p<.001) 
preinformation mean square (Table 4) in the 
analysis of variance of discrepancy scores, 
derived from the ratings of the stimulus per- 
son. This finding indicates that the means of 
discrepancy scores for different conditions of 
preinformation are not the same. However, 
examination of means (Table 5) shows that 
while the mean for the negative preinforma- 
tion is larger than the mean for no preinforma- 
tion, the latter is just about equal to the mean 
for positive preinformation. This finding sug- 
gests that for both well-adjusted and 
maladjusted subjects under the condition of 
negative preinformation the tendency to dif- 
ferentiate between the Apparent and the Real 
personality is accentuated. Thus the predic- 
tion expressed in the second part of Hypothesis 
5 was found to apply to negative, but not to 
positive, preinformation. 

A different version of the same phenomenon 
was found in the significant (p<.01) Prein- 
formation x Levels of Personality interaction, 
in the analysis of favorableness scores, derived 
from the ratings of the stimulus person (Table 
6). This finding indicates that the levels of 
personality effect is dependent on the direction 
of the preinformation factor. In other words, 
the magnitude of the difference between levels 
of personality is not the same for all three 
types of preinformation. Examination of 
means (Table 7) shows that the difference 
between the Apparent and the Real personal- 
ity is greater when the preinformation is nega- 
tive than without the preinformation, or when 
the preinformation is positive. Figure 2 pre- 


4. 


Sec meee © Apparent 
a e—a Real 


85 


v 
wW 
4 
° 
° 
a 
80 
o 
V 
w 
z 
w 
a 
a 
< 
pa 75 
S 
a 
u 
z 
a 
w 
S79) 


Pos. No NEG 


TYPES OF PREINFORMATION 


Fic. 2. Mean favorableness scores for levels of person- 
ality, on different types of preinformation. 


sents the form of the Preinformation x Levels 
of Personality interaction. 

Both these findings are in partial support 
of the second part of Hypothesis 5. 

Hypothesis 6 which predicts that mal- 
adjusted subjects will be more certain of their 
rating than well-adjusted subjects was found 
to be true, but in view of the complexity of 
an obtained interaction, Hypothesis 6 needs 
some qualifications. Table 9 presents a sum- 
mary of the trend analysis of the certainty 
scores obtained from WPP, WNP, MPP, MNP 
groups with respect to the Apparent and the 
Real personality of the stimulus person. 

Inspection of Table 9 shows that there is no 
overall adjustment effect. 

The significant (p<.01) preinformation 
effect (Table 9) indicates that the means for 
positive and negative preinformation, averaged 
over the levels of adjustment and the levels of 
personality, differ significantly. Examination 
of means (Table 10) shows that under the 
condition of positive preinformation the sub- 
jects are more certain of their rating than 
under the condition of negative preinforma- 
tion. 

The significant (p<.05) Adjustment x Pre- 


IMPRESSION FORMATION AND ADJUSTMENT 13 


TABLE 9 


‘TREND ANALYSIS OF VARIANCE OF CERTAINTY SCORES 


Source df MS F 
Adjustment 1 146.64 1.94 
Preinformation 1 599.45 7.94** 
Adjustment x Preinformation 1 446.25 5.91* 
Error (a) 60 75.45 
Levels of Personality 1 670.70 26.00*** 
Adjustment x Level of 

Personality 1 6.56 
Preinformation x Level of 

Personality 1 79.69 3.09 
Adjustment x Preinformation x 

Level of Personality 1 164.26 6.36* 
Error (b) 60 25.80 

*p< 05. 
**p < O01 
*** p < .001 


information interaction (Table 9) indicates 
that the preinformation effect is not the same 
for well-adjusted and for maladjusted groups. 
Examination of means (Table 10) shows that 
well-adjusted groups are more certain than 
maladjusted groups when the preinformation is 
positive, and relatively less certain when the 
preinformation is negative. However, on a 
more analytical level, when the triple inter- 
action—Adjustment x Preinformation x Levels 
of Personality variables—is considered, the 
form of the interaction appears to be much 
more complex. 

The highly significant (p<.001) levels of 
personality mean square (Table 9) indicates 
that the means for levels of personality 


TABLE 10 


Mean CERTAINTY SCORES OBTAINED ON THE RATINGS 
oF THE STIMULUS PERSON 


Apparent Real 
Group M SD M SD 


Well-adjusted 
Positive preinfor- 
mation 56.56 7.16 55.62 7.05 
No preinformation 51.75 13.34 41.15 12.74 
Negative preinfor- 


mation 49.18 5.77 44.37 4.47 
Maladjusted 

Positive preinfor- 

mation 57.68 5.32 48.81 8.34 


No preinformation 52.06 13.32 47.68 10.33 
Negative preinfor- 
mation 53.25 5.77 52.06 8.39 


averaged over the types of preinformation and 
the levels of adjustment differ significantly. 
Examination of means (Table 10) shows that 
the subjects are more certain when rating the 
Apparent than when rating the Real level of 
personality. 

The interaction between levels of personal- 
ity and adjustment effect, and the interaction 
between levels of personality and types of pre- 
information (Table 9), when considered sepa- 
rately, are not significant. However, the 
Adjustment x Preinformation x Levels of Per- 
sonality interaction is significant (p<.05). 
This means that the Adjustment x Preinfor- 
mation interaction discussed above does not 
have the same form with different levels of 
personality. Figure 3 presents the form of 
this triple interaction. 

Examination of means (Table 10) shows 
that maladjusted subjects are more certain 
than the well-adjusted subjects when rating 
the Apparent personality. This aspect of the 
findings is in agreement with Hypothesis 6. 
This difference in certainty between the two 
adjustment groups is greater with the negative 
than with the positive preinformation. How- 
ever, in the rating of the Real aspect of the 
stimulus person (Figure 3) the relationship 
between maladjusted and well-adjusted groups 
becomes more complex. Well-adjusted sub- 
jects show a tendency to be more certain with 
the positive than with the negative preinfor- 
mation, as was the case in their rating of the 
Apparent person. However, maladjusted sub- 
jects behave oppositely of the well-adjusted 
subjects which is also opposite of the tendency 
they showed in their rating of the Apparent 


————+ $e. 


Sg WSs 
ss S 


MEAN CERTAINTY SCORES 
MEAN CERTAINTY SCORES 


Tekee 
POS. PR. NEG.PR POS.PR. NEG. PR. 


APPARENT REAL 
Fic, 3. Mean certainty scores of WPP, WNP, MPP, 


and MNP groups, on the Apparent and the Real 
levels of personality. 


14 MATKOM 


personality. In the rating of the Real person, 
maladjusted subjects have more confidence 
with negative than with the positive preinfor- 
mation. 

It should be mentioned that when data from 
the two no preinformation groups are added to 
the analysis of variance of the certainty scores, 
the significance of the Adjustment x Preinfor- 
mation x Levels of Personality interaction dis- 
appears. This is probably due to the fact that 
in the absence of the preinformation stimulus 
the well-adjusted group (WnP) and the mal- 
adjusted group (MnP) do not differ in terms 
of the certainty scores in either of the levels of 
personality (Figure 4) and also because the 
variance in both of these groups is relatively 
very high (Table 10). It seems that the pres- 
ence of the preinformation stimulus provides 
for a greater uniformity of reaction of the 
groups and reduces the variance. It should 
be noted that the difference between the means 
of the well-adjusted and maladjusted groups 
(WNP and MNP) is significant when the pre- 
information is negative. This is the case in 
both the rating of the Apparent personality 
(t=1.98, df=30, p<.05) and in the rating 
of the Real personality (t=3.10, df=30, 
p<.001). The differences between means of 
the certainty scores of the same groups do not 
reach statistical significance, when the pre- 
information is positive. 

In respect to the effect of the negative pre- 
information on the subjects’ ratings, which was 
found to be highly significant (Table 6), one 
could legitimately raise the question about the 
extent to which it is a reflection only of 
changes in rating of those traits contained in 
the preinformation stimulus. In other words, 


hi] ü ea ts 
a pa 
© 35) O 55 
o o 
D D ` 
> 
É E \ 
z z 
a s = \ 
3 & ` 
8 o x 
z z y, 
= 
fe 45) D 45) 8 
= t See = Coke 
POS.PR NOPR. NEG.PR POS PR. NOPR. NEG PR. 
APPARENT REAL 


Fic. 4. Mean certainty scores for all six groups on the 
Apparent and the Real levels of personality. 


one may want to know what was the degree 
of generalization from the three traits given in 
the preinformation to the other 15 traits on 
the rating sheet. 

For this purpose each subject’s ratings of 
the 3 traits given as negative preinformation 
were separated from his ratings on the other 
15 traits on the rating sheet, and an analysis of 
variance of the 15 trait scores was carried out. 
Well-adjusted and maladjusted groups with 
the same kind of preinformation were treated 
together as one group. Only ratings of the 
Real level of personality were utilized. If the 
effect of the preinformation was limited only 
to the 3 traits contained in the preinformation, 
then one would expect to find no difference 
between the means of the ratings on the re- 
maining 15 traits on the rating sheet. 

The analysis of variance of these 15 trait 
scores, with a significant (p<.001) between- 
groups mean square indicates that the effect 
of preinformation on the subjects’ ratings was 
not limited to the 3 traits contained in the 
preinformation stimulus, but it was generalized 
to the ratings of the other 15 traits on the 
rating sheet. 


Discussion 


The following is a summary of the most 
significant findings of this investigation: 

Apparent-Real Dichotomy. (a) The dis- 
crepancy in the ratings of the Apparent and 
the Real levels of personality is significantly 
larger in maladjusted than in well-adjusted 
subjects. (b) The difference between malad- 
justed and well-adjusted subjects in the rating 
of the Apparent personality is minimal. (c) 
The difference between maladjusted and well- 
adjusted subjects in the rating of the Real 
level of personality is large. 

Preinformation. (a) Negative preinforma- 
tion has a strong effect on the subjects’ rating. 
(b) The biasing effect of preinformation is 
the same for maladjusted and well-adjusted 
subjects, hence it is not a function of adjust- 
ment. (c) The effect of the information con- 
tained in the negative preinformation tends to 
reflect upon traits not contained in the pre- 
information. (d) This tendency to generalize 
from the preinformation traits to the other 
traits does not differentiate between well- 
adjusted and maladjusted subjects. 


— 


o> eta OS 


— ra 


Impression FoRMATION AND ADJUSTMENT 15 


Certainty. (a) With respect to the Apparent 
level of personality, maladjusted subjects are 
more certain of their ratings than well-adjusted 
subjects. (b) With respect to the Real level 
of personality, maladjusted subjects are less 
certain than well-adjusted subjects when the 
preinformation is positive, but they are more 
certain of their ratings than well-adjusted 
subjects when the preinformation is negative. 

Findings related to the differentiation be- 
tween the Apparent and the Real levels of 
personality in the ratings of maladjusted and 
well-adjusted subjects, are in support of Hy- 
potheses 1, 2, and 3. These findings were 
demonstrated repeatedly in different experi- 
mental conditions of this study, i.e., in the 
rating of the stimulus person, in each type of 
preinformation, and also in the rating of the 
Average American. 

One may say that subjects in this study 
differentiated between the Apparent and the 
Real personality primarily because they were 
asked to do so. This is naturally true, in the 
same sense that it is true that the subjects 
interpreted the inkblots because they were 
asked to. The important point here, however, 
is that the well-adjusted subjects went about 
doing what they were asked to in a way that is 
consistently different from the way of the mal- 
adjusted subjects. From this it seems to be 
indicated that the failure to control the Real- 
Apparent dichotomy in any experiment involv- 
ing rating of people would be likely to result in 
a blurring of the measurement of the variable 
under investigation. 

The findings in regard to the discrepancy 
between the Real and the Apparent levels of 
personality indicate that the major difference 
in perception between well-adjusted and mal- 
adjusted individuals is in the Real, not in the 
Apparent, aspect of perception. The differ- 
ence, in terms of the Apparent aspect of 
perception, is minimal. This suggests the 
possibility that accuracy of perception in the 
narrow sense of the word, is not the variable 
that could contribute to the better under- 
standing of the differences in adjustment 
among the normal population. 

The marked differences between well-ad- 
justed and maladjusted individuals in the per- 
ception of the Real personality is likely to have 
important behavioral implications. The tend- 


ency to see the Real and the Apparent person- 
ality to others as being far apart, is probably 
a reflection of a basic attitude of suspiciousness 
of an individual toward people in general. 
Individuals with a high discrepancy score 
would be expected to behave in a way that 
does not appear warranted to others and there- 
fore seems strange, bizarre, and incompre- 
hensible from the viewpoint of other people. 
Suspiciousness and bizarreness are usually 
considered to be attributes of the maladjusted 
rather than well-adjusted individuals. 

It should be understood that the difference 
between well-adjusted and maladjusted sub- 
jects in the rating of the Real personality is 
said to be large only in relation to the much 
smaller difference in rating of the Apparent 
personality. The “absolute” magnitude of the 
differences is actually small and could be de- 
scribed adequately only in terms of degrees of 
a given trait and almost never in terms of 
trait opposites. An inspection of the group 
means (Table 11) for each of the 18 traits 
of well-adjusted and maladjusted groups dis- 
closes that the mean scores of well-adjusted 
and maladjusted groups are almost in all 
instances both on the same side of the trait 
scale, the score of the maladjusted subjects 
being just slightly but consistently below that 
of the well-adjusted subjects. It should be ap- 
parent that Asch’s method of forcing the 
subject to choose between positive and nega- 
tive trait opposites would eliminate or exag- 
gerate these subtle differences. 

With respect to Hypotheses 4 and 5, which 
deal with the biasing effect of preinformation, 
the following was found: While positive pre- 
information has little or no effect on the sub- 
jects’ rating, negative preinformation has a 
strong effect in the negative direction. This 
finding is in partial support of Hypothesis 5, 
The biasing effect of preinformation, however, 
was found to be the same for maladjusted and 
well-adjusted subjects, hence it could not be 
a function of adjustment. This finding is in 
disagreement with Hypothesis 4. 

These findings are interesting in connec- 
tion with the widely held opinion that better 
adjustment is a result of a more accurate in- 
terpersonal perception (see, however, Steiner, 
1955). This opinion finds little support in 
this study. In this study, maladjusted and 


16 MATKOM 


TABLE 11 


COMPARISON OF MEAN TRAIT FAVORABLENESS SCORES OF WELL-ADJUSTED AND MALADJUSTED 
GROUPS IN THE RATING OF THE REAL PERSONALITY OF THE STIMULUS PERSON 


Positive No Negative 
preinformation preinformation preinformation 
z Well- Mal- Well- Mal- Well- Mal- 
Trait adjusted adjusted adjusted adjusted adjusted adjusted 

1. Lazy—Industrious 5.25 4.95 4.95 4.35 4.70 4.25 
2. Scrupulous—Unscrupulous 4.30 4.00 3.70 3.25 3.20 3.25 
3. Humane—Ruthless 4.45 3.90 4.40 3.90 3.45 3.35 
4. Sociable—Unsociable 3.75 2.90 3.10 2.65 2.60 2.55 
5. Selfish—Unselfish 4.40 4.40 4.70 3.80 4.00 3.50 
6. Insincere—Sincere 5.10 4.60 5.05 4.65 4.35 3.55 
7, Serious—Frivolous 4.05 3.85 3.65 3.40 2.95 2.95 
8. Unintelligent—Intelligent 4.80 4.30 4.50 3.65 4.00 3.80 
9. Practical—Impractical 3.95 3.55 3.75 2.95 2.95 3.10 
10. Irritable—Goodnatured 4.45 3.60 4.60 3.85 3.90 3.30 
11. Profound—Shallow 3.40 2.85 3.10 2.25 2.10 1.85 
12. Conceited—Unassuming 3.95 3.60 4.30 3.95 3.10 2.95 
13. Mature—Immature 4.05 3.60 3.50 3.30 3.05 2.45 
14. Evasive—Direct 5.05 4.20 4.50 3.70 3.95 3.80 
15. Blunt—Polite 5.25 4.30 4.80 4.60 4.35 3.10 
16. Thoughtful—Impulsive 4.05 3.90 3.65 3.25 2.60 2.10 
17. Warm—Cold 4.15 2.90 3.60 3.15 2.90 2.10 
18. Dependent—Independent 4.05 3.75 3.85 3.55 3.10 3.45, 


well-adjusted subjects responded to the pre- 
information stimulus by shifting their ratings 
about equal distances from the position of 
their own control group. If the difference be- 
tween well-adjusted and maladjusted subjects 
in rating of the stimulus person was an ex- 
pression of a better accuracy of perception of 
the well-adjusted subjects, one would expect 
the well-adjusted group to resist the biasing 
suggestion of the preinformation stimulus more 
successfully than the maladjusted groups. 
There is little evidence of a relatively greater 
resistance of well-adjusted groups to the per- 
ceptual biasing effect in this investigation. 

If some individuals do perceive more accu- 
rately than others, this differential accuracy 
among individuals does not appear to be a 
function of adjustment. The common observa- 
tion that some maladjusted individuals seem 
to be very perceptive of others is in agreement 
with this theory. 

Preinformation as used in this experiment 
has considerable similarity to important social 
stimuli such as recommendation, slander, 
propaganda, indoctrination, etc. On the basis 
of this similarity and generalizing from find- 
ings of this study with respect to preinforma- 
tion, one may speculate that well-adjusted and 


maladjusted individuals are equally susceptible 
to the biasing effect of the “preinformation- 
like” social stimuli. This, of course, does not 
mean that well-adjusted and maladjusted indi- 
viduals would behave alike when under the 
influence of such social stimuli. The original 
difference in the amount of the differentiation 
between the Real and the Apparent levels of 
personality of well-adjusted and maladjusted 
individuals would remain unmodified by the 
effect of the preinformationlike social stimuli. 
This important difference in the Real-Ap- 
parent dichotomy is believed to be sufficient to 
produce essential differences in the behavior of 
well-adjusted and maladjusted individuals, in 
spite of the effect of the preinformationlike 
social stimuli. 

Another finding with respect to the effect of 
the preinformation indicates that while posi- 
tive preinformation has little if any effect on 
the subjects’ ratings, negative preinformation 
has a strong effect and tends to reflect upon 
traits not contained in the preinformation. 
This finding may also have interesting be- 
havioral implications. Applied to everyday life 
this finding suggests that when we hear some- 
thing positive about another person, we pro- 
ceed cautiously and our opinion of the other 


ss 


Impression FORMATION AND ADJUSTMENT 17 


person tends to remain the same. However, 
when the received information is negative we 
are inclined to form a more negative opinion 
of the other person as a whole. This tendency, 
however, does not differentiate between well- 
adjusted and maladjusted individuals; it seems 
to be common to both. In other words it seems 
to be much easier to damage a person’s repu- 
tation by slander than to build it up by 
recommendation. 

The extent of generalization from the find- 
ings of this study to the possible behavioral 
implications should be restricted by the fact 
that all of the data in this investigation are 
related only to the first impression of people. 
Important as the first impression may be in 
everyday life, what comes after the initial im- 
pression is also extremely important. The 
analysis of the certainty scores suggests that 
what happens after the first impression could 
be essential for the understanding of the more 
subtle differences in adjustment. The degree 
of certainty of one’s impressions is probably 
only the first step of divergence between mal- 
adjusted and well-adjusted behavior. Con- 
sideration of the factors such as the ability to 
change one’s impressions upon further evi- 
dence or the willingness to search for addi- 
tional information would introduce new and 
more subtle differentiations within the group 
of individuals called well-adjusted. 

The findings with respect to certainty scores 
are generally in agreement with Hypothesis 6. 
They indicate that well-adjusted and mal- 
adjusted subjects behave differently and to a 
large extent in a directly opposite manner in 
terms of their feelings of certainty. In the 
perception of the Real level of personality mal- 
adjusted subjects are less certain than well- 
adjusted subjects with positive preinformation, 
but they are considerably more certain of their 
ratings than well-adjusted subjects when the 
preinformation is negative. Thus, although 
Hypothesis 4 which predicted a stronger bias- 
ing effect of preinformation on the ratings of 
maladjusted subjects than on the ratings of 
well-adjusted subjects has not been borne out 
by the results, one may still conclude that pre- 
information does have a differential effect on 
well-adjusted and maladjusted subjects. This 
effect, however, is not manifested in their rat- 
ings, but only in the feeling of certainty. 


The tendency of the maladjusted subjects to 
feel certain about negative impressions of 
people is most likely to result in numerous dif- 
ficulties for them in interpersonal relations. 
The tendency of the well-adjusted ‘subjects to 
feel uncertain and confused in a similar situa- 
tion will have the opposite effect of reducing 
the incidence of “mistakes” in social inter- 
actions. 

Results seem to indicate that one of the 
basic differences between well-adjusted and 
maladjusted individuals lies in the kinds of 
expectations they have of other people. The 
data suggest that the maladjusted individuals 
expect others to be “bad.” Accordingly, they 
give more credence to the negative preinforma- 
tion while the well-adjusted individual expects 
others to be “good” and feels more at home 
with the positive impression of others. 

It was not anticipated that the certainty 
variable would result in such complex inter- 
actions with the variables of adjustment and 
levels of personality. For this reason these 
newly found and important relationships need 
to be further investigated. 


SUMMARY 


This study is an attempt to investigate the 
relationship between some aspects of interper- 
sonal perception and adjustment through the 
analysis of the process of impression forma- 
tion. 

It was speculated that “accuracy” of percep- 
tion is not likely to be a variable that is 
functionally related to the dimension of adjust- 
ment. Instead it was hypothesized that proc- 
esses such as the differentiation between the 
Real and the Apparent levels of personality, 
resistance to the biasing effect of expectations 
upon interpersonal perception, and the feeling 
of certainty of one’s impressions of others are 
the processes which are more likely to be 
responsible for the differences between well- 
adjusted and maladjusted behavior. 

In order to test these hypotheses, 120 college 
freshmen were divided into well-adjusted 
and maladjusted groups. First they were asked 
to rate the Average American in terms of what 
he “appears” to be and separately in terms 
of what he “really” is on 18 6-point trait 
scales, made up of trait opposites. In the 


18 MATKOM 


second experimental session one third of all 
the subjects viewed a person (stimulus per- 
son) on the screen after having been given a 
favorable brief description (preinformation) 
of the stimulus person (e.g., industrious, polite, 
and warm). Another third of the subjects 
viewed the stimulus person after an unfavora- 
ble description (e.g., lazy, blunt, and cold) 
and the remainder of the subjects viewed the 
film without any preinformation. In each pre- 
information condition there was an equal num- 
ber of well-adjusted and maladjusted subjects. 
After the film viewing the subjects were asked 
to rate the stimulus person in the same way 
they had rated the Average American, but to 
base their judgment only on what they saw on 
the screen. Simultaneously the subjects were 
asked to indicate the degree of certainty for 
each rating they made. 

The findings indicate that the discrepancy 
in perception between the Real and the Ap- 
parent levels of personality is reliably larger in 
maladjusted than in well-adjusted subjects; 
that the difference between well-adjusted and 
maladjusted subjects in the perception of the 
Apparent personality is minimal, while the 
difference in the perception of the Real per- 
sonality, between well-adjusted and malad- 
justed subjects, is consistently large. Be- 
havioral implications of the Real-Apparent 
discrepancy were discussed. It was suggested 
that this discrepancy is a reflection of an 
individual’s general attitude of suspiciousness 
toward other people and could be considered as 
a measurement of such attitude. The necessity 
to control the Apparent-Real dichotomy in any 
experiment on interpersonal perception was 
pointed out. 

With respect to preinformation it was found 
that positive preinformation has no effect, 
while negative preinformation has a definite 
effect on ratings in the expected direction. It 
was suggested that this finding may imply 
that the damaging effect of slander is more 


powerful than the beneficial effect of recom- 
mendation. The preinformation stimulus was 
likened to some important “preinformation- 
like” social stimuli, but it was pointed out 
that this does not mean that the reaction of 
well-adjusted individuals to these stimuli is 
the same as the reaction of the maladjusted 
individuals. 

In addition it was found that preinformation 
does not remain specific to the traits men- 
tioned in the preinformation, but tends to 
generalize to the other traits. 

The analysis of the certainty scores dis- 
closed a second order interaction between ad- 
justment, preinformation, and the levels of 
personality. It was found that in the rating of 
the Apparent levels of personality maladjusted 
subjects are more certain of their ratings with 
both positive and negative preinformation. 
However, in the rating of the Real levels of 
personality maladjusted subjects are much 
more certain than well-adjusted subjects when 
preinformation is negative, but are less certain 
than well-adjusted subjects when the prein- 
formation is positive. This was interpreted 
in terms of the difference between well-ad- 
justed and maladjusted individuals in respect 
to what they expect other people to be like. 

In generalizing, it was suggested that the 
tendency of the maladjusted individuals to be 
certain of their first impressions of others is 
likely to result in numerous difficulties for 
them in interpersonal relations, while the 
tendency of the well-adjusted individuals to 
feel uncertain and confused in a similar situa- 
tion will have the opposite effect, of reducing 
the incidence of “mistakes” in social inter- 
actions. 

The theory of the accuracy of perception as 
related to adjustment was discussed in light of 
the findings of this study. 

All except one of the six hypotheses in this 
study have been substantiated by the findings. 


REFERENCES 


Ascu, S. E. Forming impressions of personality. J. 
abnorm, soc. Psychol., 1946, 41, 258-290. 

Curve, V. B., & Rrcwarps, J. M. Accuracy of inter- 
personal perception: A general trait? J. abnorm. 
soc. Psychol., 1960, 60, 1-7. 


Darev, C. A. The effects of premature conclusion 
upon the acquisition of understanding of a person, 
J. Psychol., 1952, 33, 133-152. 

Kerley, H. H. The warm-cold variable in first im- 
pressions of people. J. Pers., 1950, 18, 431-439. 


\ 


i 
i 


Impression FoRMATION AND ADJUSTMENT 19 


Lucuins, A. S. Forming impressions of personality: 
A critique. J. abnorm. soc. Psychol., 1948, 43, 318. 

Mensu, I. N., & Wisner, I. Asch on “Forming im- 
pressions of personality”: Further evidence. J. 
Pers., 1947, 16, 188-191. 

Secorp, P. F. Facial features and inference processes 
in interpersonal perception. In R. Tagiuri & L. 
Petrullo (Eds.), Person perception dnd interpersonal 
behavior. Stanford: Stanford Univer. Press, 1958. 
Pp. 300-315. 

Secorn, P. F., & Bevan, W. Personalities in faces: 
III. A cross-cultural comparison of impressions of 
physiognomy and personality in faces. J. soc. 
Psychol., 1956, 43, 283-288. 

Secorb, P. F., & Muruarp, J. E. Individual differ- 
ences in the perception of women’s faces. J. ab- 
norm. soc. Psychol, 1955, 50, 238-242. (a) 


Secor, P. F, & Muruarp, J. E. Personalities in 
faces: IV. A descriptive analysis of the perception 
of women’s faces and the identification of some 
physiognomic determinants. J. Psychol., 1955, 39, 
269-278. (b) 

STEINER, I. D. Interpersonal behavior as influenced 
by accuracy of social perception. Psychol. Rev., 
1955, 62, 268-274. 

Tarr, R. The ability to judge people. Psychol. Bull., 
1955, 52, 1-21. 

Tacrurt, R, Social preference and its perception. In 
R. Tagiuri & L. Petrullo (Eds.), Person perception 
and interpersonal behavior. Stanford: Stanford 
Univer. Press, 1958. Pp. 316-336. 


(Received October 11, 1962) 


Vol. 77, No. 7 


Whole No. 570, 1963 


Psychological Monographs: General and Applied 


i 


PREDICTION OF THE FIRST YEAR COLLEGE 
PERFORMANCE OF HIGH APTITUDE 
STUDENTS + 


ROBERT C. NICHOLS anp JOHN L. HOLLAND 


National Merit Scholarship Corporation, Evanston, Illinois 


The academic achievement and extracurricular achievements in science, art, 
writing, dramatics, music, and leadership during the 1st year in college of a 
large national sample of high aptitude students were predicted from an assess- 
ment of their aptitudes, originality, self-ratings, aspirations, personality, inter- 
ests, home backgrounds, and the child rearing attitudes of their parents. The 
0-order and multiple correlations between predictors and criteria were presented 
and discussed. The findings revealed a number of nonintellective predictors of 


the college achievement criteria. 


Ve report is one of a series of studies 
designed to develop and to test predic- 
tors of the college performance of high apti- 
tude students, Previous reports in the series 
(Holland, 1958, 1959, 1960, 1961; Holland & 
Astin, 1962) were concerned with developing 
criteria of creative and academic performance 
and screening a large number of potential 
predictors of these criteria. The present study 
is intended to extend the scope of criterion 
performances, to replicate some previous 
findings, and to continue the screening of 
predictors. 

The basic design of this study involved the 
assessment of a sample of National Merit 
Finalists and their parents at the time of the 
Finalists’ graduation from high school in order 
to obtain predictor measures. The students 
were reassessed for criterion performances 1 
year later when they finished their first year 
of college. The predictors were then corre- 
lated with the criteria and several multiple 
prediction methods were tested. 

The use of National Merit Finalists as sub- 
jects is advantageous for a study of this sort 
for two reasons. First, they are a homogene- 
ous group with regard to aptitude, a known 
predictor of achievement of many kinds. 


1 This study is a part of the research program of 
the National Merit Scholarship Corporation and was 
supported by grants from the Carnegie Corporation 
of New York, the National Science Foundation, and 
the Ford Foundation. 


With the variance in aptitude thus restricted, 
other, nonintellective, correlates of achieve- 
ment become more prominent. Second, these 
students are more likely than the average 
student to achieve in all areas (Nichols & 
Davis, 1963), so it is possible to obtain cri- 
teria of unusual achievement even during the 
first year at college. 


METHOD 


Subjects 


The student sample was obtained from a one-sixth 
random sample of National Merit Finalists who 
graduated from high school in 1960. Approximately 
10,000 Merit Finalists were selected as the highest 
scorers by state from 550,000 students who took 
the National Merit Scholarship Qualifying Test 
(NMSQT) in March 1959. These students took the 
Scholastic Aptitude Test (SAT) in December 1959, 
as a part of the scholarship competition. The Merit 
Finalists were selected solely on the basis of their 
performance on these two tests. A one-sixth random 
sample of students and their parents was selected and 
was tested by mail in May 1960 to obtain the non- 
intellective predictors. Complete materials were re- 
ceived from 1,033 students, a 73% return; 1,007 
mothers and 936 fathers of these students also com- 
pleted questionnaires concerning their attitudes to- 
ward child rearing and their aspirations for their 
children. In June 1961 the 1,033 students were again 
polled by mail, and complete materials were received 
from 819 or 79%. These students were attending 
219 different colleges and universities. Table 1 gives 
descriptive data about the sample. These data are 
consistent with other information which indicates 
that National Merit Finalists are intelligent, have 
excellent academic records, and come from back- 
grounds of high socioeconomic status. 


No wacting 


2 Rosert C. Nicnots anD Jonn L. HOLLAND 


TABLE 1 


DESCRIPTIVE DATA ON THE SUBJECTS OF THE STUDY 


Males (N = 544) Females (N = 275) 


Descriptive data 1 SD M SD 

SAT-Verbal 667.17 52.92 665.81 47.27 
SAT~-Mathematical 698.10 63.05 654.10 66.96 
First year college grades (4.0 system) 3.10 62 3.15 55 
"Percentile rank in high school class 94.42 11.12 95.92 6.63 
Family income $8,300.00 $2,690.00 $8,400.00 $2,760.00 


Criteria 

The students answered questionnaires at the end 
of their freshman year in college. From their re- 
sponses 14 criterion measures were obtained. These 
criteria included grades, plans to obtain the PhD, and 
achievement scales. The achievement scales were 
check lists of possible accomplishments in leadership, 
science, dramatics, writing, music, and graphic art. 
‘A particular item was assigned to a scale solely on 
the basis of its content. The internal consistency of 
the resulting scales was often not as high as would 
be desirable in a scale measuring a construct such as 
propensity for achievement in a particular area. 
However, the items in the present criterion scales 
deal with overt behaviors which are themselves of 
social significance. In this instance we are concerned 
with the content validity of the scale rather than 
with its reliability for measuring a construct which 
is in some ways different from the scale content. If 
one accepts the scale content as covering the possible 
achievements in a given area, the only other question 
of validity is that of the accuracy of the students’ 
reports. 

The problem ’of the scalability of the various items 
was largely obviated by the obtained distributions 
of the data. All the achievements in the scales oc- 
curred infrequently, so that on most scales more than 
half the subjects obtained a score of zero. In these 
instances the scores were dichotomized so that the 
criterion was performing one or more of the scale 
behaviors versus not performing any of them, 

In constructing the scales, we wanted to include 
only accomplishments of great difficulty and un- 
doubted social value; but at the same time it was 
necessary to include accomplishments which occur 
frequently enough among college freshmen to be re- 
searchable with the sample size available. Therefore, 
two types of items were included: (a) rare achieve- 
ments which involve public recognition and (b) 
lower level achievements which occur more fre- 
quently. Most of the criterion scales were scored for 
all items whether low level or rare and then for the 
rare items alone. Since the variance of the rare items 
was small, most of the variance in the total scale 
was due to the lower level achievements. 

The specific criterion measures and their correla- 
tions with the predictor variables are described later 


in the paper. 


Predictors 


Before the students entered college, data for 154 
predictor variables were obtained. They are listed 
below. 


Aptitude measures. 

1. The SAT-Verbal score. 

2. The SAT-Mathematical score. The students 
took the SAT at regular examining centers under 
standard conditions as part of the National Merit 
scholarship competition, 

High school achievements. 

3. Rank in high school graduating class (HSR). 
These data were supplied by the student’s high school 
principal, who indicated the student’s rank and the 
size of his class; from these, percentile ranks were 
calculated. 

4. High school rank relative to college attenders. 
The principal also indicated the proportion of the 
previous year’s graduating class which had entered 
college. Assuming that the same proportion entered 
college that year and that those entering college are 
the highest ranking students, each subject's percentile 
rank relative to students entering college was calcu- 
lated. This method of ranking constituted a rough 
correction in the high school rank for the quality of 
the high school. 

5. Artistic achievement in high school. This varia- 
ble was assessed by an 11-item scale consisting of 
achievement items similar to those used as criteria 
for artistic, musical, literary, and dramatic achieve- 
ment in college. 

6. Scientific achievement in high school. This 
variable was assessed by a 5-item scale similar to the 
criterion scale of scientific achievement in college. 

7. Number of elective offices held in high school 
as reported by the student. 

8. Breadth of creative activity (called creative 
hobbies earlier; Holland, 1961). This variable was 
assessed by a 32-item check list of creative activities 
which ranged from making Christmas cards and coin- 
ing new words to initiating a club and planning an 
independent experiment. The score was the number 
of activities engaged in as reported by the student. 

The students were asked to give a written comment 
on one of six social problems presented to them. 
These comments were scored in two ways as Items 
9 and 10. 

9. Length of comment. 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 3 


10. Judged originality of comment. Judgments were 
made on a 3-point scale by an experienced scorer. 

Personality scales, The National Merit Student 
Survey (NMSS) was included as a part of the ques- 
tionnaire. This inventory includes true-false per- 
sonality scales assumed on theoretical grounds or on 
the basis of past research to be related to achieve- 
ment, Some of these scales were borrowed from 
other inventories, some were taken from previous 
forms of the NMSS, and one was constructed for 
this study. 

11. The Femininity scale from the California Psy- 
chological Inventory (CPI). 

12. The Socialization scale from the CPI. 

13. The Social Presence scale from the CPI. 

14. The Persistence scale. This is an internally 
consistent, a priori scale developed for a previous 
form of the NMSS (Holland, 1960). 

15. The Superego scale. This is an internally con- 
sistent, a priori scale (Holland, 1960). 

16. The Preconscious Activity scale. This is an 
internally consistent, a priori scale developed for this 
study to measure Kubie’s (1958) notion of precon- 
scious activity as a part of creative performance, Of 
the 58 original items, 20 were discarded by internal 
consistency analysis, leaving a 38-item scale which 
had a reliability (KR-21) of .70. 

17. The Dogmatism scale. This scale, developed by 
Rokeach (1956) to measure dogmatism and rigidity 
of thinking, was changed to a true-false format, in 
order to conform to the format of the other scales. 

18. The Acquiescence scale. This scale was devel- 
oped by Couch and Keniston (1960) to measure 
tendency to respond “true” to inventory scales. 

19. The Sense of Destiny scale. This scale was 
developed by Gough (1957). 

20. The Intolerance of Ambiguity scale. This is 
an internally consistent, a priori scale (Budner, 1959). 

21. The Complexity-Simplicity scale. This scale 
was developed by Barron (1953). 

22. The Mastery scale. This is an internally con- 
sistent, a priori scale (Holland, 1962). 

23. The Deferred Gratification scale. This is an 
internally consistent, a priori scale (Holland, 1960) ; 
it was previously called the Play scale with reversed 
scoring. 

24. The Dominance scale. This is a scale from the 
Sixteen Personality Factor (16 PF) Questionnaire 
(Catell, Saunders, & Stice, 1957). 

25. The Super-Ego Strength scale. This is a scale 
from the 16 PF test. 

26. The Radicalism scale, This is a scale from the 
16 PF test. 

27. The Risk Taking scale. This is a scale devel- 
oped by Torrance and Ziller (1957). 

28. The Introversion versus Extraversion scale. 
‘This is a scale from the Myers-Briggs Type Indicator 
(Stricker & Ross, 1962). The Myers-Briggs was 
administered as a separate booklet and was not in- 
cluded in the NMSS. 

29, The Intuition versus Sensation scale. This is 
a scale from the Myers-Briggs Type Indicator. 


30. The Feeling versus Thinking scale. This is a 
scale from the Myers-Briggs Type Indicator. 

31. The Perceptive versus Judging scale. This is 
a scale from the Myers-Briggs Type Indicator. 

32-37. The students were asked to rank their pref- 
erence for six groups of occupations. Each group 
consisted of six occupational titles selected as rep- 
resentative of the six major occupational interest 
keys of the Holland Vocational Preference Inventory 
(Holland, 1958). They were: 32, Preference for 
realistic occupations; 33, Preference for intellectual 4 
occupations; 34, Preference for social occupations; 
35, Preference for conventional occupations; 36, 
Preference for enterprising occupations; and 37, 
Preference for artistic occupations. 

38. Daydreams about work activities, The students 
were asked to give a free-response comment to the 
item: “When I daydream about my future career, 
I visualize myself doing the following things.” Re- 
sponses to this item were coded dichotomously as 
daydreaming about work activities versus all others. 

39. Preference for research role. The students were 
asked to choose from a number of alternatives the 
role they would most prefer in their future occupa- 
tions. Their responses were scored dichotomously as 
research versus all others, 

40. Enjoyment of creative thinking. The students 
were asked to give a free-response comment to the 
item: “What single hour in a class during your last 
school year did you like most? Why?” Responses 
were coded dichotomously as one which stimulated 
creative thinking versus all others. 

41. PhD aspiration at high school graduation, The 
highest degree which the subject planned to attain 
was dichotomized as PhD versus less than PhD. The 
subjects planning to obtain professional degrees were 
omitted. 

42. Desire to emulate scientists. The students were 
asked which one of 12 famous people they would 
most like to emulate, Their responses were coded as 
choosing 1 of the 3 scientists versus all others. 

43-44. The students were asked to indicate the 
school subjects, sports, and activities they enjoyed 
(50 items). Their preferences were scored for the 
total number of items checked and for a Creative 
Activities scale, made up of 19 items found by item 
analysis to be related to creative accomplishment in 
high school. These procedures provided the follow- 
ing variables: 43, Number of activities enjoyed; 
and 44, Creative activities enjoyed. 

Self-ratings. 

45-65. The students were asked to rate themselves, 
using a 4-point scale, on 20 adjectives. These self- 
ratings were scored as the number of traits rated 
in the top two categories (above average) and as the 
individual trait ratings. This scoring yielded the fol- 
lowing variables: 45, Number of self-ratings above 
average; 46, Emotional Stability; 47, Originality; 48, 
Leadership; 49, Popularity; 50, Athletic Ability; 51, 
Dependability; 52, Drive to Achieve; 53, Schol- 
arship; 54, Sociability; 55, Aggressiveness; 56, 
Neatness; 57, Self-Control; 58, Independence; 59, 


4 Rosert C. NicHots AnD Joan L. HOLLAND 


Conservatism; 60, Practical-Mindedness; 61, Expres- 
siveness; 62, Cheerfulness; 63, Self-Confidence; 64, 
Self-Understanding; and 65, Perseverance. 

Ratings by mother and father. 

66-107. The students’ parents were asked to rate 
the students on the same 20 traits used in the self- 
ratings. The following variables were thus obtained: 
66, Number of mother’s ratings above average; 67- 
86, Mother’s ratings on the traits listed as Variables 
46-65; 87, Number of father’s ratings above average; 
and 88-107; Father’s ratings on the traits listed as 
Variables 46-65. 

Family background. 

108. Urban versus rural background. 

109. Public versus private high school. 

110. Percentage of the student’s high school senior 
class entering college. 

111. Number of siblings. 

112. Father’s education. 

113. Mother’s education. 

114. Educational difference between parents (fa- 
ther minus mother). 

115. Family income. 

116-127. The students’ parents were asked to in- 
dicate their preference for the six groups of occupa- 
tions described above as Items 32-37: 116-121, 
Father’s preference for occupations; and 122-127, 
Mother’s preference for occupations. 

Parental attitudes. 

128. Authoritarian attitudes of the father. 

129. Authoritarian attitudes of the mother. 

130. Difference in Authoritarian Parental Attitude 
score between mother and father (mother minus 
father). 

131-133. An a priori scale of laissez-faire child 
rearing attitudes appropriate for the parents of ado- 
lescents was developed for this study. This scale 
yielded three variables: Item 131, Laissez-faire at- 
titudes of the father; Item 132, Laissez-faire attitudes 
of the mother; Item 133, Difference between mother 
and father in Laissez-faire attitude score (mother 
minus father). 

134-136, The Intolerance for Ambiguity scale dis- 
cussed as Variable 20 above was administered to the 
parents. Three variables were thus obtained: 134, 
Father’s intolerance of ambiguity; 135, Mother’s 
intolerance of ambiguity; and 136, Difference be- 
tween mother and father in Intolerance for Ambi- 
guity score (mother minus father). 

137-154. The students’ parents were asked to rank 
nine characteristics of behavior according to the 
degree to which they would like to see this character- 
istic in their child. The following variables were 
thus obtained. Father’s desire that the subject be: 
137, Able to defend himself; 138, Ambitious; 139, 
Curious; 140, Dependable and reliable; 141, A good 
student; 142, Happy and well adjusted; 143, In- 
dependent or self-reliant; 144, Popular; 145, Self- 
controlled. Items 146-154 were the mother’s pref- 
erential ranking of the above 9 characteristics. 


Statistical Analyses 


Each of the predictors was correlated with each 
of the criteria using product-moment correlations 


for two continuous variables, biserial correlations 
for continuous variables paired with dichotomous 
variables, and tetrachoric coefficients for two di- 
chotomous variables. These correlations were calcu- 
lated separately for all 275 female subjects who re- 
turned the follow-up questionnaire and for a sample 
of 250 males selected at random from the 544 who 
returned the follow-up questionnaire. The actual 
Ns upon which the correlations were based were 
often smaller, because of incomplete questionnaires, 
unscorable items, etc., but all correlations were based 
on Ns larger than 200, 

Multiple correlations of the best subset of pre- 
dictors for various criteria were calculated using a 
sample of 322 males; and the same analysis was re- 
peated for a sample of 164 females. These subjects 
were chosen because they had scores on all variables. 


RESULTS 


In the sections below, the intercorrelations 
of the criteria and the zero-order correlations 
of the predictors with each of the criteria 
will be discussed in turn. The tables report- 
ing the correlates of the various criteria con- 
tain only those variables which were signifi- 
cantly related to the criterion at the .01 level 
in either the male or the female sample or 
at the .05 level in both samples. When these 
significance levels are used, three or four 
correlates of each criterion would be expected 
by chance. Since the standard error of the 
biserial correlation varies with the proportion 
of subjects in each category of the dichoto- 
mous variable, the standard error appropriate 
for the correlations in each table is given at 
the top of the table to aid the reader in 
interpreting significance levels. 


Interrelationships of the Criteria 


The correlations among the 14 criterion 
measures are shown in Table 2. The criteria 
tended to have low positive correlations with 
each other; there were no significant nega- 
tive correlations among criteria. However, 
except for the correlations between rare and 
lower level achievements in the same area— 
which were spuriously high because they were 
based on common items—the correlations 
were too low to consider the criteria as 
alternative measures of a general tendency to 
achieve. : 

One special predictive relationship which 
should be noted before the predictors of the 
individual criteria are discussed is the corre- 
lation of the SAT scores with the college 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 5 


TABLE 2 


INTERCORRELATIONS OF THE CRITERIA 
(Males above the diagonal, N = 250; females below the diagonal, N = 275) 


Criterion Correlation® 

[1 sa: veal SEA STU GLR a me pO Rate ep Ee ES 
1. First year college grades os o7 15 14 —09 —06 06 —06 —08 —07 —13 —12 43 
AERE 21 44 20. «28 17 28 «444 «20 16 29 25 26 —05 
3. Rare leadership achievement —06 25 os TiU 14 26 i 17o AU ALEO 17 OST. 
4. Scientific achievement 30 14 —06 90 02 25 11 09 00 —04 —08 00 25 
5. Rare scientific achievement 32 09 —05 96 os «17° 09 —02 —05 —11 —06 —08 14 
6. Achievement, in dramatics —18 19 09 —05 —05 22 16 19 17 28 16 70 05 
7. Literary achievement —13 (54 13 —03 06 14 agi per game oy 26. 27 26 
8. Rare literary achievement 36 22 OS 1s. 16 10 60 oo —os 06 00 03 20 
9, Musical achievement 04 T38., 20e O0 TAO A D 36 26 09 78 —08 
10. Rare musical achievement 08 © «405 o5 13 09 02 —i1 58 28 19 95 09 
11. Achievement in graphic art 00 25 06 00 ii —02 29 03 26 13 94 90 —02 
12. Rare achievement in graphic art —13 08 05 —06 —05 —06 17 22 08 os 70 92 —02 
13. Total artistic achievement Zos tis 15 Em 023. aBn 22:9) 08, 82 98) TORN 9s —01 
14. PhD aspiration 19 09 02 00 —02 2 13 —11 05 06 08 02 08 


a Correlations between first year grades and total artistic, achievement are product-moment. 
variables and all others are biserials. All other correlations in the table are tetrachoric. 


achievements. The SAT-Verbal correlated 
positively with literary achievement and the 
SAT-Mathematical correlated negatively with 
rare leadership achievement. The SAT-Verbal 
also correlated with PhD aspiration. These 
three correlations were all below .30 and none 
of the other 27 correlations between the SAT 
and the various criteria was significant at the 
01 level for either sex. The failure of the 
aptitude measures to correlate significantly 
with the various indexes of achievement is 
testimony to the effectiveness of the restric- 
tion of range of aptitude of this sample. The 
large amount of variance in achievement 
measures that is frequently accounted for by 
aptitude has been effectively partialed out by 
the restriction of range in this sample. 


Academic Achievement 


The students were asked by questionnaire 
to report their first year college grade-point 
average on a letter-grade scale (A, A-, B+, 
B, B—, etc.) ; in the few instances where their 
school did not use letter grades, they were 
asked to translate their average into the letter- 
grade scale. Numerical equivalents were as- 
signed to the letter grades and were used in all 
calculations. As a check on the accuracy of the 
students’ reports, transcripts were obtained 
from the colleges for 157 students. The cor- 
relation between reported and actual grades 
was .96. The only major error was that one 
student with a C— average reported his aver- 
age grade as B+. The average transcript 
grade was actually slightly higher than the 


Correlations between these two 
Decimals are omitted, 


average reported grade, indicating that these 
students do not give themselves the benefit 
of the doubt in borderline cases. Thus, the 
students’ reports were accurate enough to 
justify using them as the criterion of aca- 
demic achievement. 

The significant correlates of college grades 
are shown in Table 3. College grades were 
correlated with two of the other criteria in 
the female sample, leadership and rare liter- 
ary achievement, but they were not signifi- 
cantly related to any of the college achieve- 
ments in the male sample. High academic 
achievers of both sexes tended to aspire to 
the PhD at the end of their first year in 
college more often than did low achievers; 
however, this relationship did not hold up 
well predictively. Plans to obtain the PhD 
as reported by the student at the time of 
high school graduation were related to col- 
lege grades for males, though the relationship 
was low (r = .12); they were not related to 
grades for females (7 = 01). These results 
suggest that changes in PhD aspiration dur- 
ing the first year of college are related to 
grades obtained. 

Grades were not significantly related to 
SAT-Verbal or SAT-Mathematical for either 
sex, a finding which attests to the effective- 
ness of the restriction of range of the aptitude 
variable in this sample. In spite of this re- 
striction of range in aptitude, however, grades 
were significantly related to high school rank 
(r = .24 and .21 for males and females, re- 
spectively). Correcting the HSR measure 
for the proportion of the students entering 


6 Rosert C. NicHors anD Jonn L. HOLLAND 


TABLE 3 


CORRELATES OF First YEAR COLLEGE GRADES 


Variable Correlation Variable Correlation 
Males Females Males Females 
Standard error of r .06 .06 Predictors—continued 
Other criteria Conformity and socialization 
Leadership achievement 04 21 ree Es a 
PhD aspiration at the end ug Dependability, , mother's 
of the freshman year 34 14 RS i? 2 
Ri £ 77. Neatness, mother’s rating 2 02 
Predictors 98. Neatness, father’s rating 20 16 
Previous academic performance 31. Perceptive versus Judg- 
ii ing scale —.28 —.12 
3. High school rank 24 21 21. Complexity-Simplicity 
4, High school rank corrected scale SUUR 01 
for percentage of class 23. Deferred Gratification 
attending college .23 16 scale 07 17 
Achievement motivation and Other predictive relationships 
perseverance 48. Leadership, self-rating 09 AS 
53. Scholarship, self-rating .20 25 87. Number of father’s 
74. Scholarship, mother’s ratings above average 11 16 
rating 16 19 11. Femininity scale 24 .04 
95. Scholarship, father’s rating .17 Al 32. Preference for realistic 
14. Persistence scale 21. .16 occupations 15 .00 
65. Perseverance, self-rating 30 18 89. Originality, father’s 
107. Perseverance, mother’s rating 05 nb 
rating 15 10 7. Elective offices in high 
86. Perseverance, father’s school —.02 5S 
rating 24 13 18. Acquiescence scale —18 —.08 
52. Drive to achieve, self- 70. Popularity, mother’s 
rating 31 .20 rating 17 06 
73. Drive to achieve, mother’s 71. Athletic ability, mother’s 
rating 34 17 rating 17 06 
94, Drive to achieve, father’s 105. Self-confidence, father’s 
rating 14 17 rating 04 16 
E An 106. Self-understanding, 
Conformity and socialization father’s rating 15 08 
12, Socialization scale 28 14 118. Father’s preference for 
15. Superego scale Al 16 social occupations 03 aay 
25. Super-Ego Strength scale 13 12 127. Mother’s preference for 
51. Dependability, self-rating .22 Al artistic occupations 03 15 


college only served to lower its correlation 
with college grades. 

The nonintellective predictors of college 
grades that were significant for both sexes 
seem to form two major clusters of traits: 
(a) perseverance and motivation to achieve, 
and (b) conformity and socialization. 

Motivation to achieve is indicated by the 
significant positive correlations for both sexes 
with ratings by the student and both his 
parents on three traits—scholarship, perse- 
verance, and drive to achieve—as well as with 
the Persistence scale. Since college grades 
are predictable from high school performance, 

20 


one might suppose that these correlations, 
especially the ratings, merely involve report- 
ing previous achievement. The degree to 
which this is the case is indicated by the 
correlations of these variables with college 
grades with HSR partialed out (see Table 4). 
In every case the correlation was reduced by 
controlling for HSR; however, most of the 
partials were still statistically significant. 
This finding suggests that the predictive 
validity of these ratings is not accounted for 
entirely by the student’s previous academic 
performance. The control of HSR reduced 


the correlations more in the female sample ` 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 7 


TABLE 4 


PARTIAL CORRELATIONS BETWEEN MOTIVATIONAL MEAS- 
URES AND COLLEGE GRADES WITH Hica SCHOOL 
Rank HELD CONSTANT 


y Partial correlation 
Variable 
Males Females 
Scholarship, self-rating 18** 18** 
Scholarship, mother’s rating 13* 11 
Scholarship, father’s rating 16* 05 
Perseverance, self-rating 7 inal 13* 
Perseverance, mother’s rating 12* 07 
Perseverance, father’s rating 23%* 11 
Drive to achieve, self-rating 30** 16* 
Drive to achieve, mother’s rating 32** 13* 


Drive to achieve, father’s rating 12* 12* 

Persistence scale veh 12% 
*p<.05. 

"p< 01. 


than in the male sample. This result is due 
to the generally higher correlation between 
ratings and HSR among female subjects. 

Conformity with socially prescribed stand- 
ards as a predictor of academic achievement 
is indicated by the significant correlations 
between college grades and ratings of de- 
pendability and neatness. Inventory scales of 
Super-Ego Strength and Socialization were 
also related to the grade criterion. The nega- 
tive relationship between grades and the 
Perceptive versus Judging scale is consistent 
with these findings, since the items of this 
scale are concerned with the dislike for order, 
routine, and conformity to a schedule. 

Academic achievement was also predicted 
for both sexes by self-ratings of leadership 
and by the number of traits rated above aver- 
age by the father. 

Among males, academic achievement was 
predicted by the somewhat contradictory com- 
bination of the Femininity scale, preference 
for realistic occupations, and mother’s rating 
of athletic ability. Grades were negatively 
related to the Acquiescence and Complexity- 
Simplicity scales in the male sample. 

For females, academic achievement was 
predicted by the Deferred Gratification scale, 
whose item content is consistent with the 
tendencies to persevere and to conform dis- 
cussed above. It was also predicted by the 
number of elective offices held in high school, 
father’s rating of originality, and parents’ 


preferences for artistic and social occupations. 

The Persistence, Super-Ego Strength, 
Femininity, and Socialization scales yielded 
significant correlations similar to those found 
in earlier studies (Holland, 1959, 1960) of 
similar samples. On the other hand, the 
Social Presence, Dominance, and Radicalism 
scales had significant negative correlations 
with first year grades in one or the other of 
the previous studies, but this finding was not 
cross validated in the present study, The 
negative relationship between grades and 
radicalism found in previous studies is con- 
sistent with the tendency to conform noted 
above; however, the negative correlations of 
the Social Presence and Dominance scales 
found in previous studies suggest that the 
academic achiever tends to be ill at ease and 
passive in social situations. None of the 
correlates of academic achievement found in 
the present study confirm this suggestion. On 
the contrary, there was some indication that 
the academic achiever tended to be a leader 
in high school. 


Leadership 


The questionnaire that the students re- 
ceived at the end of their freshman year at 
college contained a check list of achievements 
which included 12 items pertaining to leader- 
ship in some area, These items were com- 
bined to form a leadership score. Since the 
total score had a marked positive skew, it 
was dichotomized near the median (two 
achievements) for all analyses. The items 
for this scale and the percentage of the total 
sample checking each were as follows: 


Was elected president of my class (1.0%) 

Initiated or organized a student movement to 
change institutional rules, procedures, or policies 
(2.3%) 

Organized a college political group or campaign 

Indicate how many— (3.7%) 

Elected to one or more student offices (15.5%) 
(The score for this item was the number of offices 
indicated, which ranged from 0 to 5.) 

Initiated a business enterprise of any kind (4.9%) 

Received an award or special recognition for lead- 
ership of any kind (8.7%) 

‘Active member of two or more student organiza- 
tions (51.3%) 

Appointed to one or more student offices (22.3%) 

Nominated to one or more student offices (27.0%) 

Actively campaigned to elect another student to a 
school office (21.7%) 


8 Rosert C. NicHors AnD Joun L, Horranp 


Participated in an off-campus political campaign 
(3.79%) 

Participated in a student movement to change in- 
stitutional rules, procedures, or policies (23.1%) 


An examination of these 12 items reveals 
that although all of them suggest some in- 
volvement in political activities or leader- 
ship, most presuppose little in the way of 
real leadership ability. Items dealing with 
ordinary or low level leadership activities 
were included in order to obtain adequate 
variance in the scale score. It is also desira- 
ble, however, to obtain a measure of the 
manifestation of outstanding leadership 
ability, which by definition occurs very rarely. 
Therefore, the first 3 items listed above were 


scored as a Rare Leadership Achievement 
scale. The student was given a score of 1 
for rare leadership if he checked any one of 
these three items and a score of O if he 
checked none. Nine percent of the males and 
4% of the females checked one or more of 
the rare leadership items, In discussing the 
criterion measures of leadership, the total 
score for all items will be referred to as the 
Leadership Achievement scale, while the score 
for the first 3 items will be referred to as 
the Rare Leadership Achievement scale, 
The significant correlates of leadership are 
shown in Table 5. Achievement of this kind 
was related to most of the other achievement 
criteria in the study. Among the predictors 
there appear to be three major types of 


TABLE 5 


CORRELATES OF ACHIEVEMENT IN LEADERSHIP DURING THE First YEAR OF COLLEGE 


Variable Correlation 
Males Females 


Standard error of r .08 .08 


Other criteria 


First year college grades 04 -21 
Rare leadership achieve- 


ment (item overlap) 44 .25 
Rare scientific achievement .25 09 
Literary achievement 28 54 
Rare literary achievement 44 22 
Musical achievement .20 38 
Achievement in graphic art .29 25 
Rare achievement in 

graphic art 25 08 
Total artistic achievement  .26 18 

Predictors 


Leadership and breadth of activity 
in high school 
7. Number of offices held in 
high school 34 .20 
48, Leadership, self-rating 36 13 
69. Leadership, mother’s rating .18 .20 
8. Breadth of creative 


activity .29 16 
43, Number of activities 
checked as enjoyed 36 10 


Extraversion and social interest 
28. Introversion versus Extra- 


version scale —.12 —.15 
19. Sense of Destiny scale .20 .11 
34, Preference for social 

occupations .21 14 


32. Preference for realistic 
occupations —.20 —.13 


Variable Correlation 
Males Females 
Predictors—continued 
Extraversion and social interest 
—continued 
49. Popularity, self-rating 415 AS 
54, Sociability, self-rating .20 13 
27. Risk Taking scale 19 ll 


Originality and creativity 
40, Course most enjoyed was 
one that stimulated creative 


thinking 19 19 
44, Creative activities enjoyed 33 AS 
61. Expressiveness, self-rating  .20 10 
82. Expressiveness, mother’s 

rating 26 07 
47, Originality, self-rating .22 .06 

Other predictors 

113. Mother’s education .20 18 
112. Father’s education 24 08 


114. Educational difference 
between parents (father 


minus mother) 06 —.18 
11. Femininity scale 43 03 
15. Superego scale 21 .03 
22, Mastery scale 20 04 
45. Number of self-ratings 

above average 18 18 
66. Number of mother’s ratings 

above average 06 .20 
78. Self-control, mother’s 

rating .23 Al 
139. Father’s desire that the 

child be curious —.06 —.19 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 9 


variables: (a) those reflecting leadership and 
breadth of activity in high school, (b) those 
reflecting extraversion and sociability, and 
(c) those reflecting originality and creativity. 

Variables which reflect leadership and 
breadth of activity in high school had the 
highest predictive relationships with the 
Leadership Achievement scale which was re- 
lated to the number of offices held in high 
school and to self-ratings and mother’s rat- 
ings of leadership. It was also related to 
breadth of creative activity, number of ac- 
tivities checked as enjoyed, and the Risk 
Taking scale, which is composed mainly of 
life history items dealing with a wide range 
of adventurous behavior. 

Leadership achievement was also predicted 
by measures of extraversion and sociability 
for both sexes. It was correlated negatively 
with the Introversion versus Extraversion 
scale and positively with the Sense of Destiny 
scale, which consists mainly of items dealing 
with dominating, exhibitionistic tendencies 
and with decisiveness, Leadership was cor- 
related positively with a liking for social 
occupations and negatively with a liking for 
realistic occupations, which suggests an inter- 
est in people rather than things. It was also 
related to self-ratings of popularity and 
sociability, and to the number of self-ratings 
above average, a measure of positive self- 
evaluation, for both sexes. For females the 


number of mother’s ratings above average was 


also related to leadership. 

Leadership was also related to several 
variables which may be regarded as evidence 
of originality and creative thinking. The 
leaders of both sexes tended to report in a 
free-response item that the class hour they 
had most enjoyed in high school was one 
which stimulated their creative thinking; 
they indicated, on a check list, that they en- 
joyed activities which are related to creative 
accomplishment; and they rated themselves 
high on expressiveness. Male subjects tended 
to rate themselves as original and their 
mothers tended to rate them high on expres- 
siveness. 

The educational level of the mother was 
related to leadership for both sexes but the 
father’s education was significantly related to 
leadership for the male subjects only. The 


relationship of the measure of educational 
difference between the parents and leadership 
in females suggests that it is the mother’s 
education relative to the father’s which is 
important for the emergence of leadership in 
girls. If the mother’s education was low 
relative to that of the father, the daughter 
was less likely to be a leader. 

For male subjects, leadership was predicta- 
ble from the Femininity, Super-Ego, and 
Mastery scales. The Mastery scale is con- 
cerned with self-confidence and with a sense 
of purposefulness and devotion to a worth- 
while cause. There is also something of this 
sense of devotion to a right and worthy cause 
in the content of the Femininity and Superego 
scales and this may be the common element 
responsible for the correlations of these 
scales with the leadership criterion. 

The correlates of rare leadership are shown 
in Table 6. Rare leadership was much less 
predictable in this study than the total 
Leadership Achievement scale was, and it had 
fewer correlates with the other achievement 
criteria. This result may be due to a number 
of factors: the restricted variance of the 
Rare Leadership Achievement scale, the 
probability of a greater degree of error in 
measurement, the possibility that rare leader- 
ship among college freshmen is due to en- 
vironmental contingencies rather than to the 
personal characteristics of the student, or the 
possibility that appropriate measures of the 
personal traits involved in rare leadership 
achievement were not included among the 
predictors. The restricted variance of the 
scale is at least partially responsible, since 
the rare achievement scales for the other 
criteria also tended to have fewer correlates. 
The fact that the significant predictors of rare 
leadership differed somewhat from those for 
the lower level leadership achievements sug- 
gests that different personality factors may be 
involved. The correlations between the Rare 
Leadership Achievement and the Leadership 
Achievement scales—.44 and .25 for males 
and females, respectively—were not high, 
considering the degree of item overlap. 

Rare leadership achievement was best pre- 
dicted for males by artistic achievement in 
high school. (It was unrelated to number of 
offices held in high school.) It was also re- 


10 Rosert C. NICHOLS AND Jonn L. HOLLAND 


TABLE 6 


CORRELATES OF RARE LEADERSHIP ACHIEVEMENT 
DURING THE First YEAR OF COLLEGE 


Correlation 
Variable —— 
Males Females 
Standard error of r Al 33 
Other criteria 
Leadership achievement 
(item overlap) 44 25 
Literary achievement 26 13 
Musical achievement lid 23 
Predictors 
Artistic orientation 
5. Artistic achievement in 
high school 32 08 
121. Father’s liking for 
artistic occupations 32 AS 


Extraversion and self-confidence 
28. Introversion versus Extra- 
version scale —11 —.23 


19. Sense of Destiny scale 32 .07 
Other predictors 

2. SAT-Mathematical —30 —.12 
17. Dogmatism scale 28 06 
81. Practical-mindedness, 

mother’s rating —.26 04 
95. Scholarship, father’s 

rating —.04 —.29 


146. Mother’s desire that the 

child be able to defend 

himself 28 .03 
142. Father’s desire that the 

child be happy and well 

adjusted — 39 .09 


lated to the father’s expressed liking for 
artistic occupations in both sexes. 

The finding that rare leadership was nega- 
tively related to the Mathematical score of 
the SAT for both sexes is difficult to interpret 
because of the restricted range of ability in 
this sample. In a more heterogeneous sam- 
ple it is likely that both SAT-Verbal and 
SAT-Mathematical would be correlated with 
leadership achievement. However, among 
very high aptitude students, those highest in 
mathematical ability are less likely to show 
rare leadership achievement. 

As in the case of the Leadership Achieve- 
ment scale, rare leadership was negatively re- 
lated to the Introversion versus Extraversion 
scale; and in the male subjects it was also 
related to the Dogmatism and Sense of Des- 


tiny scales. The item content of the Sense of 
Destiny scale is concerned with extraversion 
and decisiveness, and the Dogmatism scale is 
concerned with dogmatic belief and devotion 
to a cause, as well as with self-confidence. 
These scales imply traits which were noted 
earlier as predictors of the leadership scale. 

There were two interesting negative rela- 
tionships between rare leadership achieve- 
ment and parents’ ratings. The scale was 
negatively related to mother’s rating of prac- 
tical mindedness in males and to father’s 
rating of scholarship in females. 


Scientific Achievement 


The check list of achievements which the 
students completed at the end of their first 
year at college included nine items pertain- 
ing to achievement in science. All of the 
scientific achievements were infrequently 
checked and the total score was highly 
skewed. It was therefore dichotomized as one 
or more achievements checked versus none 
checked. Eleven percent of the males and 
3% of the females checked one or more items. 
The items and the percentage of the total 
sample checking each were as follows: 

Received a research grant (1.3%) 

Received a prize or award for a scientific paper 
or project (8%) 

Gave an original paper at a scientific or profes- 
sional meeting sponsored by a professional society 
or association (.27%) 

Had scientific or scholarly paper published (or in 
press) in a scientific or professional journal (.2%) 

Invented a patentable device (.67%) 

Member of student honorary scientific society 


(1.8%) 
Built a piece of equipment or laboratory apparatus 
of my own (not as part of a course) (3.8%) 
Appointed a teaching assistant in a scientific field 


(1.69%) 


Entered a scientific competition of any kind 
(1.5%) 

The first six of these items seemed to offer 
more dependable testimony of scientific 
achievement than the last three, so they were 
scored separately as a Rare Scientific 
Achievement scale. Only 6% of the males 
and 2% of the females checked one or more 
of the Rare Scientific Achievement items. 
The two Scientific Achievement scales were 
highly correlated (.90 and .94 for males and 
females, respectively) as is to be expected 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 11 


because of the large proportion of common 
items. 

The significant correlates of the total 
Scientific Achievement scale are shown in 
Table 7. Achievement in science was not 
highly related to other kinds of achievement 
in the first year of college. Scientific achieve- 
ment was related to PhD aspiration, and the 
Rare Scientific Achievement scale was related 
to leadership for the male subjects. But none 


TABLE 7 


CORRELATES OF SCIENTIFIC ACHIEVEMENT DURING 
THE Frrst YEAR OF COLLEGE 


Variable Correlation 
Males Females 
Standard error of r 10 15 


Other criteria 
Rare scientific achievement 


(item overlap) 90 -94 

PhD aspiration at end of 

freshman year 27 15 
Predictors 


Scientific interest and achievement 
6. Scientific achievement in 


high school 32 08 
42. Desire to emulate the 

lives of scientists 27 .29 
41. PhD aspiration in high 

school 33 32 


Parental ratings 
83. Cheerfulness, mother’s 


rating 27 08 
82. Expressiveness, mother’s 

rating 27 .15 
89. Originality, father’s rating -25 05 
97. Aggressiveness, father’s 

rating 17 40 


99. Self-control, father’s rating .10 40 
105. Self-confidence, father’s 


rating 17 43 
86. Perseverance, mother’s 

rating .25 13 
107. Perseverance, father’s 

rating 13 38 
87. Number of father’s 

ratings above average .20 38 

Parental characteristics and 

attitudes 
112. Father’s education 27 08 


131. Father’s laissez-faire child 


rearing attitudes 30 —.04 
139, Father’s desire that the 
child be curious 25 05 


of the correlations with the rest of the criteria 
were significant at the .01 level for either 
sex. 

In this study, scientific achievement was 
not very predictable. Only two predictors 
were consistently related to both Scientific 
Achievement scales for both male and female 
samples. They were plans to obtain the PhD 
and positive evaluation by the father as ex- 
pressed by the number of father’s ratings 
above average. 

None of the personality scales or the self- 
ratings was predictive of either of the Scien- 
tific Achievement scales. The major corre- 
lates of achievement in science were the 
parental ratings. The picture of the scientific 
achiever obtained from the parents was simi- 
lar for both males and females, but the 
mother’s ratings tended to be related to scien- 
tific achievement in males while the father’s 
ratings tended to have higher correlations for 
females. These ratings pictured the scientific 
achiever as cheerful, expressive, original, ag- 
gressive, self-controlled, and persevering. He 
tends to have scientific aspirations, as is indi- 
cated by his desire to emulate the lives of 
scientists and by his scientific achievement 
in high school; the latter correlation was not 
significant for females. 

The correlates of rare scientific achieve- 
ment are shown in Table 8. The male with 
rare scientific achievement tends to be rated 
by his father as self-confident but lacking in 
self-understanding, He usually comes from a 
high income family, and is rated as popular 
by his mother. The female with rare scientific 
achievement tends to be rated as conserva- 
tive, self-controlled, and self-confident. She 
frequently expresses interest in realistic oc- 
cupations and is rated low in popularity by 
her mother. 

Several variables concerning the attitudes 
and values of the parents were related to 
scientific achievement, but they did not show 
enough consistency between sexes or from 
one scale to another to be thoroughly con- 
vincing. The general impression that one 
gets from these correlations is that laissez- 
faire child rearing practices and parental de- 
sire for the child to be independent and 
self-reliant are related to future scientific 
achievement. 


12 Rosert C. Nicuots AND Jonn L. HOLLAND 


TABLE 8 


CORRELATES OF RARE SCIENTIFIC ACHIEVEMENT 
DURING THE FIRST YEAR OF COLLEGE 


Variable Correlation 
Males Females 

Standard error of r 13 17 
Other criteria 

Leadership achievement 25 09 

Scientific achievement 

(item overlap) 90 -94 
Predictors 


Scientific interest and achievement 
32. Preference for realistic 


occupations — .06 43 
41. PhD aspiration in high 
school 26 46 


Parental ratings 
70. Popularity, mother’s rating 30 
80. Conservatism, mother’s 
rating 04 43 
99, Self-control, father’s rating .08 43 


105. Self-confidence, father’s 

rating .22 35 
106. Self-understanding, 

father’s rating — 32 09 
87. Number of father’s ratings 

above average .26 32 

Parental characteristics and 

attitudes 
112. Father’s education 32 12 
113. Mother’s education 36 —.03 
115. Family income 38 —.12 
138. Father’s desire that the 

child be ambitious 16 60 
154. Mother’s desire that the 

child be self-controlled —32 —.10 
126. Mother’s preference for 

enterprising occupations 24 30 


Achievement in Dramatics 


The check list of achievements which the 
students completed at the end of their first 
year of college included four items pertain- 
ing to achievement in speaking or dramatic 
activities, All of the dramatic items were 
infrequently checked and the total score was 
highly skewed. It was, therefore, dichoto- 
mized as one or more achievements checked 
versus none checked. Ten percent of the 
males and 3% of the females checked one 
or more items. The items and the percentage 
of the total sample checking each were as 
follows: 


Won one or more speech or debate contests (2.87) 

Was regular performer on radio or TV program 
(2.6%) 

Received an award for acting, playwriting or other 
phase of drama (.4%) 

Had minor roles or leads in plays not produced 
by a college or university (1.6%) 


Since the achievements covered by these 
items occurred very rarely and since they 
seem to be evidence of genuine dramatic 
ability, a subset of rare items was not scored 
for this scale. 

The correlates of achievement in dramatic 
activities are shown in Table 9. Dramatic 
achievement was independent of most of the 
other criteria. It was related only to achieve- 
ment in graphic arts for males and to the 


TABLE 9 


CORRELATES OF RARE SCIENTIFIC ACHIEVEMENT 
THE FIRST YEAR OF COLLEGE 


Jati 
Variable Sorel 
Males Females 
Standard error of r 3E) 15 
Other criteria 
Achievement in graphic art .28 —.06 
Total artistic achievement 
(item overlap) 70 48 
Predictors 
Creativity 
5. Artistic achievement in 
high school AS he 
44. Number of creative activi- 
ties enjoyed 24 43 
8. Breadth of creative 
activity .26 .03 
40. Course most enjoyed was 
one which stimulated 
creative thinking —.19 40 
Social dominance 
28, Introversion versus Extra- 
version scale —.26 —.30 
24, Dominance scale 24 aks 
13. Social Presence scale .22 33 


91. Popularity, father’s rating .10 43 
9. Length of free-response 
comment on a social 


problem 31 —.18 
Other predictors 
99. Self-control, father’s 
rating —.27 28 
71. Athletic ability, mother’s 
rating —.27 15 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 13 


total Artistic Achievement scale, with which 
its items overlap. 

Dramatic achievement was related to most 
of the predictors assumed to be measures of 
creativity. For both sexes, it was correlated 
with artistic achievement in high school and 
with the number of creative activities en- 
joyed. It was related to breadth of creative 
activity for males and for females to the 
coded free response that the most enjoyed 
class hour in high school was one which 
stimulated creative thinking. The latter re- 
lationship, however, was negative for males. 

As might be expected, dramatic achieve- 
ment can be predicted from measures of 
“outgoingness” and dominance in social situa- 
tions, It was negatively related to the 
Introversion versus Extraversion scale, and 
positively related to the Dominance and Social 
Presence scales for both sexes. Dramatic 
achievement was correlated with father’s rat- 
ings of popularity for females and with the 
length of the free-response comment on a 
social problem for males. 

Dramatic achievement was positively re- 
lated to self-control as rated by the father 
for females, but was negatively related to this 
same variable for males. It was also nega- 
tively related to mother’s rating of athletic 
ability. 


Literary Achievement 


The check list of achievements which the 
students completed at the end of their first 
year of college included seven items pertain- 
ing to achievement in writing or other liter- 
ary activities. The total score was highly 
skewed and was therefore dichotomized as 
one or more achievements checked versus 
none checked. Forty-one percent of the 
males and 38% of the females checked one 
or more of the literary achievement items. 
The items and the percentage of the total 
sample checking each were as follows: 

Had poems, stories, essays, or articles published in 
a public newspaper, magazine, anthology, etc. (not 
college publication) (2.0%) 

Wrote one or more plays (including radio or TV 
plays) which were given public performance (5%) 

Did news or feature writing for a public newspaper 
(1.7%) 

Had poems, stories, essays, or articles published in 
a college publication (7.19%) 


Won literary award or prize for creative writing 
(9%) 

Was editor or feature writer for collegiate paper, 
annual, magazine, or anthology, etc. (5.4%) 

Wrote an original, but unpublished piece of crea- 
tive writing on my own (not as part of a course) 


(21.6%) 


Because the first five of the above items 
occur rarely, they were scored separately as 
a Rare Literary Achievement scale. Since 
most of the variance in the total Literary 
Achievement scale is accounted for by un- 
published writing, the major difference be- 
tween literary achievement and rare literary 
achievement is the factor of publication or 
public recognition, The two Literary 
Achievement scales were correlated .77 and 
.66 for males and females, respectively, but 
these coefficients are spuriously high because 
of the item overlap. 

The correlates of literary achievement are 
shown in Table 10. Achievement in writing 
was related to all of the other achievement 
criteria with the exception of scientific and 
dramatic achievement, 

Literary achievement was related to a va- 
riety of measures of creativity, originality, 
and preference for imagination and intuition 
over facts and careful reasoning. For both 
sexes, the Literary Achievement scale was 
positively correlated with artistic achieve- 
ment in high school; breadth of creative ac- 
tivities; number of creative activities en- 
joyed; ratings of originality by mother, 
father, and self; self-ratings and father’s rat- 
ings of expressiveness; and the inventory 
scales of Preconscious Activity, Complexity- 
Simplicity, Intuition versus Sensation, and 
Perceptive versus Judging. Literary achieve- 
ment was also related positively to prefer- 
ences for artistic occupations and negatively 
to preferences for conventional and realistic 
occupations, but the latter correlation was 
not significant for males. Among male sub- 
jects literary achievement was positively re- 
lated to the Feeling versus Thinking scale 
and to the coded free response that the most 
enjoyed high school course was one which 
stimulated creative thinking and negatively 
with the desire to emulate scientists. The 
expression of impulse implicit in the above 
correlates of literary achievement appears to 
extend to a general disregard for conventional 


14 Ropert C. Nicnors AND Jonn L. Horranp 


TABLE 10 
CORRELATES OF LITERARY ACHIEVEMENT DURING THE First YEAR OF COLLEGE 
Variable F Sorrelation’ Variable gue Correlation., 
Males Females Males Females 
Standard error of r .08 08 Predictors—continued 
Other criteria Creativity and originality 
Leadership achievement 28 54 —tontinued 
Rare leadership achieve- 37. Preference for artistic 
ment 26 AS occupations 40 38 
Rare literary achievement 35. Preference for conven- 
(item overlap) Tyi 66 tional occupations —16 —.26 
Musical achievement 22 19 32. Preference for realistic 
Rare musical achievement .23 01 occupations —37 —08 
Achievement in graphic art .37 .29 30, Feeling versus Thinking 
Rare achievement in scale 19 —08 
graphic art 26 17 42. Desire to emulate the lives 
Total artistic achievement of scientists — 32 
(item overlap) 27 22 12. Socialization scale .00 
PhD aspiration at end of 25. Super-Ego Strength scale 03 
freshman year .22 18 Extraversion and social dominance 
7 7. Number of elective offices 
Bee Sept in high school .06 20 
Creativity and originality 9. Length of free-response 
5. Artistic achievement in comment on a social 
high school 14 36 problem 35. 14 
8. Breadth of creative 48, Leadership, self-rating al 20 
activity 27 33 97. Aggressiveness, father’s 
44, Number of creative activi- rating -20 14 
ties enjoyed 15 26 58. Independence, self-rating 05 20 
47. Originality, self-rating 23 43 24. Dominance scale .20 26 
68. Originality, mother’s rating .22 32 19. Sense of Destiny scale 19 23 


89. Originality, father’s rating .19 26 
103. Expressiveness, father’s 


rating 37 22 
61. Expressiveness, self-rating 30 33 
82. Expressiveness, mother’s 

rating 16 19 
16. Preconscious Activity scale .28 40 
21. Complexity-Simplicity 

scale Fi 24 
29. Intuition versus Sensation 

scale 20 24 
31. Perceptive versus Judging 

scale 10 .20 


28. Introversion versus Extra- 


version scale —15 —18 
13. Social Presence scale 05 19 
Other predictors 
1. SAT-Verbal 14 .20 
41. PhD aspiration in high 
school 28 OL 
71. Athletic ability, mother’s 
rating —.29 04 
116. Father’s preference for 
realistic occupations —.22 —.06 
141. Father’s desire that the 
child be a good student 00 —.19 


standards of behavior among the females as 
is indicated by the negative correlations of 
the Socialization and Super-Ego Strength 
scales for female subjects. 

Literary achievement was also related to a 
number of measures of extraversion and 
social dominance. It was positively correlated 
in both sexes with the number of elective 
offices held in high school, the number of 
activities enjoyed, the length of the free- 
response comment on a social problem, self- 
ratings of leadership, ratings by the father of 


aggressiveness and practical-mindedness, and 
the personality scales of Dominance and 
Sense of Destiny; it was negatively correlated 
with the Introversion versus Extraversion 
scale. For female subjects it was also posi- 
tively associated with the Social Presence 
scale and self-ratings of independence. 

In addition to the above measures of origi- 
nality and leadership, literary achievement 
was positively related to the SAT-Verbal for 
both sexes and to high school plans to obtain 
the PhD degree for male subjects, but not 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 15 


TABLE 11 


CORRELATES oF RARE LITERARY ACHIEVEMENT DURING 
THE First YEAR OF COLLEGE 


Variable Correlation 
Males Females 
Standard error of r 09 Al 
Other criteria 
College grades 05 35 
Leadership achievement A4 422 
Literary achievement 
(item overlap) 77 66 
Predictors 
Creativity and originality 
5. Artistic achievement in 
high school 223 26 
8. Breadth of creative activity .23 36 
44, Number of creative activi- 
ties enjoyed .22 24 
68. Originality, mother’s 
rating 25 .18 
61. Expressiveness, self-rating 29 .26 
82. Expressiveness, mother’s 
rating .20 -30 
103: Expressiveness, father’s 
rating 25 32 
37. Preference for artistic 
occupations 34 38 
35. Preference for conventional 
occupations —.23  —.22 
32. Preference for realistic 
occupations —31 —.06 
40. Course most enjoyed was 
one which stimulated 
creative thinking 34 —.14 
Extrayersion and social dominance 
7. Number of elective offices 
in high school 26 24 
24. Dominance scale 09 30 
19, Sense of Destiny scale 23 32 
43, Number of activities 
enjoyed 16 30 
69. Leadership, self-rating .22 20 
49. Popularity, self-rating —.07 34 
Other predictors 
41. PhD aspiration in high 
school 22 —.08 
78. Self-control, mother’s 
rating .22 12 
117. Father’s preference for 
jntellectual occupations —19 —.28 
128. Father’s authoritarian 
child rearing attitudes —.25 30 
134. Father's Intolerance of 
Ambiguity scale —.22 22 
145. Father’s desire that the 
child be self-controlled 25 —.14 


for female subjects. Also father’s authori- 
tarian child rearing attitudes and intolerance 
of ambiguity were negatively related to rare 
literary achievement for males and positively 
related for females. Since it is possible that 
literary achievement is thought to be appro- 
priate for females more so than for males, 
perhaps the authoritarian father tends to in- 
fluence his child to achieve in areas considered 
more appropriate for his sex. 

The correlates of the Rare Literary 
Achievement scale are shown in Table 11. 
Unlike the total Literary Achievement score, 
rare literary achievement, which involves 
publication, was not related to the other ar- 
tistic achievements. Instead it was related to 
leadership and, in the case of the female sub- 
jects, to grades. These results suggest that 
the activity and drive which are involved in 
leadership achievement may be the traits 
necessary to get writing published. 

The two general classes of predictors of the 
total Literary Achievement scale, originality 
and extraversion, were also found among the 
predictors of rare literary achievement. Al- 
most all of the high school behavior and 
ratings which were predictive of the total 
Literary Achievement scale were also related 
to rare literary achievement. However, none 
of the seven creativity and personality scales 
which were related to the former criterion 
was related to the latter. This finding sup- 
ports the notion mentioned above that publi- 
cation, at least during the first year of college, 
has a larger component of energy level and 
drive than of creativity. 


Musical Achievement 


The check list of achievements contained 
10 items pertaining to achievement in some 
phase of music. The total score was highly 
skewed and was therefore dichotomized as 
one or more musical achievements checked 
versus none checked. Twenty-two percent of 
the males and 24% of the females checked 
one or more of the musical achievement items. 
The items and the percentage of the total 
sample checking each were as follows: 


Won prize or award in a musical competition as a 
performer (5%) 


16 Rosert C. Nicnors anD Joun L. HoLLAND 


Composed music which has been given at least one 
public performance (1.2%) 

Arranged music for public performances (1.3%) 

Had one or more musical publications (.1%) 

Performed with a professional orchestra (2.6%) 

Gave a recital (not collegiate) (.9%) 

Directed (publicly) a choir (.4%) 

Organized a singing group (1.3%) 

Sang in a college choir (13.4%) 

Gave a collegiate recital (2.6%) 


Since the first seven of the above items 
occur rarely and involve a judgment by some 
external authority on the excellence of the 
musical accomplishment, they were scored 
separately as a Rare Musical Achievement 
scale. A subject was given a score of 1 if he 
checked one or more of the rare musical items 
and a score of O if he checked none, Eleven 
percent of the males and 5% of the females 
reported one or more rare musical achieve- 
ments, The Rare Musical Achievement scale 
correlated .86 and .58 with the Musical 
Achievement scale for males and females, 
respectively, but these coefficients are inflated 
by item overlap. 

The correlates of musical achievement are 
shown in Table 12; those of rare musical 
achievement are shown in Table 13. Both 
scales were related to achievement in other 
artistic areas both concurrently and predic- 
tively: to achievement in the graphic arts in 
college, to artistic achievement in high school, 
and, for males, to literary achievement in col- 
lege. The correlation of these scales with the 
total Artistic Achievement scale is to be ex- 
pected because of item overlap; however, the 
correlations were significantly higher for the 
Rare Musical Achievement scale than for the 
total Musical Achievement scale, even though 
there was less item overlap. 

With the exception of the correlations with 
other artistic achievements noted above, the 
correlates of musical achievement are some- 
what different from those of rare musical 
achievement. The Rare Musical Achievement 
scale also had appreciably more correlates. 

Rare musical achievement was predicted by 
a number of measures of creativity and origi- 
nality for both sexes. It was significantly 
related to breadth of creative activities, 
mother’s rating of originality and expressive- 
ness, preference for artistic occupations, and 
rejection of realistic occupations; and to the 


TABLE 12 


CORRELATES OF MUSICAL ACHIEVEMENT DURING THE 
First YEAR OF COLLEGE 


tii 
Variable gecosrelation 
Males Females 
Standard error of r .09 .08 
Other criteria 
Leadership achievement 20 38 
Rare leadership achieve- 
ment 17 23 
Literary achievement .22 19 
Rare achievement in music 
(item overlap) 86 58. 
Achievement in graphic art .26 14 
Total artistic achievement .78 82 
Predictors 


Originality and artistic interest 
5. Artistic achievement in 


high school 21 19 
30. Feeling versus Thinking 

scale 21 10 
37. Preference for artistic 

occupations .22 —,03 
44. Number of creative 

activities enjoyed 21 —.03 
103. Expressiveness, father’s 

rating 21 03 


Other predictors 
131. Father’s laissez-faire child 


rearing attitudes .03 22 
132. Mother’s laissez-faire child 

rearing attitudes 21 .07 
79. Independence, mother’s 

rating -00 .26 
88. Emotional stability, 

fathers’ rating —.01 22 
90. Leadership, father’s 

rating — 03 22 
22. Mastery scale —.22 05 


Preconscious Activity and Perceptive versus 
Judging scales. It was also related to. the 
Complexity-Simplicity scale, the Feeling ver- 
sus Thinking scale, and the father’s rating of 
expressiveness. For males, it was negatively 
related to the Super-Ego Strength scale. 
There seem to be some real sex differences 
in the predictors of rare musical achievement. 
A number of measures of social ascendency 
and drive were related to rare musical 
achievement in females but not in males. 
Rare musical achievement for females was 
predicted by mother’s ratings of leadership, 
popularity, athletic ability, and neatness; and 


| 
l 


5 mt 


, 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 17 


TABLE 13 


CORRELATES OF RARE MUSICAL ACHIEVEMENT DURING THE First VEAR or COLLEGE 


Variable by Comrela linn PA Correlation 
Males Females Males Females 
Standard error of r 09 13 Predictors—continued 
Other criteria Social Ascendency and drive 
Literary achievement .23 01 27. Risk Taking scale 07 36 
Musical achievement 52. Drive to achieve, self- 
(item overlap) 86 58 rating 03 32 
Achievement in graphic art .28 13 69. Leadership, mother’s rating .02 36 
Total artistic achievement 70. Popularity, mother’s 
(item overlap) 91 93 rating — 08 40 
Predictors 71. Athletic ability, mother’s 
rating —.15 40 


Originality and artistic interest 
5, Artistic achievement in 


high school 35 27 
8. Breadth of creative 

activity 23 25, 
16. Preconscious Activity 

scale 28 02 
21. Complexity-Simplicity 

scale 28 06 
30. Feeling versus Thinking 

scale 25 —.06 
31. Perceptive versus Judging 

scale 25 17 
32. Interest in realistic 

occupations —25 —.21 
37. Interest in artistic 

occupations 38 32 


68. Originality, mother’s rating  .18 38 
103. Expressiveness, father’s 


rating 28 04 


77. Neatness, mother’s rating .00 36 
9. Length of free-response 
comment 05 —.40 


Other predictors 


11, Femininity scale 270 — AL 
18. Acquiescence scale .27 Al 
25. Super-Ego Strength 

scale —.28 .06 
66. Number of mother’s rat- 

ings above average —.08 40 
79. Independence, mother’s 

rating —17 32 


114. Educational difference 

between parents (father 

minus mother) 28 —.21 
130. Authoritarian child rearing 

attitude difference between 

parents (father minus 

mother) .20 23 


by the self-rating of drive to achieve. It was 
also related to scores on the Risk Taking 
scale, which consists of a number of bio- 
graphical items concerning experiences which 
presuppose independence and self-confidence. 
Among males, on the other hand, rare musical 
achievement was related positively to the 
Femininity and Acquiescence scales and nega- 
tively to mother’s rating of athletic ability. 

The Rare Musical Achievement scale was 
also predicted for both sexes by two varia- 
bles involving differences between the two 
parents, If the father’s child rearing attitudes 
were more authoritarian than the mother’s, 
both males and females were more apt to 
show rare musical achievement. If the fa- 
ther’s education was high relative to the 
mother’s, a son was more likely to show rare 
musical achievement; whereas rare musical 
achievement in daughters was more likely 


when the mother’s education was high rela- 
tive to the father’s. 

The Musical Achievement scale differs 
from the Rare Musical Achievement scale in 
that most of the variance was contributed 
by the item “Sang in a college choir,” which 
does not appear in the rare scale. As noted 
above, this scale was related to achievement 
in other artistic areas, and also to both the 
Leadership Achievement scales for both sexes. 
Among male subjects it was predicted by in- 
terest in artistic occupations, and the number 
of activities enjoyed; it was related positively 
to the Feeling versus Thinking scale and 
negatively to the Mastery scale. Among fe- 
males the Musical Achievement scale was 
related to mother’s rating of independence 
and to father’s rating of emotional stability 
and leadership. Mother’s laissez-faire child 
rearing attitudes were related to the Musical 


18 Rosert C. NicHots anD Jonn L. HOLLAND 


Achievement scale for male subjects; similar 
attitudes on the part of the father were 
predictive for female subjects. 


Achievement in Graphic Art 


Seven items on the check list of achieve- 
ments pertained to achievement in graphic 
art. The total score was highly skewed and 
was therefore dichotomized as one or more 
achievements checked versus none checked. 
Seventeen percent of both the males and fe- 
males checked one or more of the Graphic 
Art Achievement items. The items and the 
percentage of the total sample checking each 
were as follows: 

Won a prize or award in an art competition 
(painting, sculpture, ceramics, etc.) (.2%2) 

Had photographs, drawings, or other art work ex- 
hibited or published (3.1%) 

Exhibited or published at my college a work of 
art (painting, musical composition, sculpture, etc.) 


(2.2%) 
Exhibited or performed (not at my college) a work 
of art (painting, musical composition, sculpture) 


(1.5%) 


Finished a work of art (painting, musical composi- 
tion, sculpture) on my own (not as part of a course) 


(9.5%) 


Entered an artistic competition or contest of any 


kind (1.3%) 
Produced one or more works of art (not as part 
of a course) (6.6%) 


Since the first four of the above items oc- 
cur rarely and involve some degree of public 
recognition of artistic ability, they were 
scored separately as a Rare Graphic Art 
Achievement scale. 

At the time that the check list was con- 
structed, the division of the various artistic 
achievements into scales was not clearly con- 
ceptualized; as a result, three of the above 
items specifically mention musical composi- 
tion in addition to achievement in graphic art. 
This error may introduce some spurious cor- 
relation with the Musical Achievement scales. 
However, since the correlations with the 
Musical Achievement scales were not high 
and were not even significant in the case of 
rare achievement in graphic art, the error is 
not a grave one. 

The correlates of achievement in graphic 
art are shown in Table 14; those of rare 
achievement in graphic art are shown in 
Table 15. The correlates of the two levels 


of achievement in graphic art were quite 
similar. Both the Graphic Art Achievement 
and Rare Graphic Art Achievement scales 
were related to both of the Leadership 
Achievement scales, the Literary Achieve- 
ment scale, and the total Graphic Art 
Achievement scale for both sexes. There is 
item overlap with the total Graphic Art 
Achievement scale; however, the correlations 
were much higher than would be expected on 
this basis alone. In addition, the Graphic 
Art Achievment scale was related to both 
Musical Achievement scales for both sexes 
and to both Dramatic Achievement scales 
for male subjects. 

Achievement in graphic art was predicted 
by a number of measures of originality and 
expressiveness for both sexes, though this was 
truer of the total Graphic Art scale than for 
the Rare Achievement scale. The Graphic Art 
scale was predicted by the Complexity-Sim- 
plicity scale and the Intuition versus Sensa- 
tion scale for both sexes. Among males, it 
was also related to the Feeling versus Think- 
ing scale. In addition, graphic art achieve- 
ment was related for both sexes to ratings of 
originality by the mother, the father, and the 
student himself and to the self-rating of ex- 
pressiveness. Graphic art achievement was 
also related to the number of creative activi- 
ties enjoyed. Among male subjects it was 
predicted by the length and judged origi- 
nality of the free-response comment on a so- 
cial problem; it was negatively related to 
interest in conventional occupations and to 
the desire to emulate scientists. 

Among male subjects, both graphic art 
scales were negatively related to rank in high 
school class, and to the mother’s and father’s 
ratings of athletic ability, although the moth- 
er’s rating was not significantly associated 
with the Rare Achievement scale. Both scales 
were also related to the Femininity scale. 

The Rare Achievement scale was not as 
predictable from the personality measures as 
the total Graphic Art scale was. Among the 
measures of originality which were related to 
the total scale, only the Preconscious Ac- 
tivity scale was related to rare achievement 
for both sexes: originality of the free-re- 
sponse comment was also related for male 
subjects. Rare achievement was related to 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 19 


TABLE 14 


CORRELATES oF ACHIEVEMENT IN GRAPHIC Art DURING THE First YEAR OF COLLEGE 


Variable _ Correlation _ f Correlation 
Males Females Vasile 
Males Females 
Standard error of r .09 09 Predictors—continued 
Other criteria Originality and artistic interests 
Leadership achievement .29 25 —continued 
Achievement in dramatics 28 —.06 89. Originality, father’s rating 30 12 
Literary achievement 37 29 61. Expressiveness, self-rating  .24 28 
Rare musical achievement .26 14 Parental attitudes and values 
Musical achievement 28 13, 119. Father's i x 
Rare achievement in graphic ‘ pba ar teas 5 con- n i 
art (item overlap) 94 71 DOU pa ONTEER per 
Total artistic achievement Ah er E ET child 22 to 
: (item overlap) 90 am 134. Father’s Intolerance of j : 
Predictors Ambiguity scale sgi — 36, 
Originality and artistic interests 149. Mother’s desire that the 
5. Artistic achievement in n dependable ang i n 
high school i $ i yet He 
eat ea 5 Op 9 140. Father’s desire that the 
8. Breadth of creative ¢hildibe dependable aad 
activity 40 30 5 ea 
10. Originali f reliable —.09 —.22 
o One of ttre response AUR, 150. Mother’s desire that the 
9. Length of free-response child be a good student —.18 —.18 
PREET 30 04 141. Father’s desire that the 
16. Preconscious Activity scale .39 .24 child Ig lis good student ae aes 
Pie ig 143. Father’s desire that the 
20. Intolerance of Ambiguity ë A 
eet Siig HEE ET child be independent and 
21. Complexity-Simplicity i $ sell relent ae vl 
PE 27 15 148. Mother’s desire that the 
29. Intuition versus Sensation chnd'beiciidon oe HD 
scale 21 18 Other predictors 
30. Feeling versus Thinking 3, High school rank — 30 04 
scale 22 03 11. Femininity scale 33 —.04 
37. Preference for artistic 15. Superego scale 01 —.24 
occupations 40 24 48. Leadership, self-rating —.07 125 
35. Preference for conven- 64. Self-understanding, self- 
tional occupations —.24 07 rating 24 12 
42. Desire to emulate the lives 71. Athletic ability, mother’s 
of scientists —31 —.05 rating —22 —13 
40. Number of creative 92. Athletic ability, father’s 
activities enjoyed 22 18 rating — 30 04 
47. Originality, self-rating 39 19 79. Independence, mother’s 
68. Originality, mother’s rating -19 24 rating —.19 18 


several measures of lack of dependability and 
drive. As noted above it was negatively re- 
lated to high school rank. It was also nega- 
tively related to mother’s rating of drive to 
achieve in both sexes. Among males it was 
negatively related to mother’s rating of de- 
pendability. Unexpectedly, rare graphic art 
achievement was related to the self-rating of 
emotional stability for both sexes. For fe- 
male subjects the Rare Graphic Art Achieve- 
ment scale seems also to be related to breadth 


of activity and interest as evidenced by its 
correlations with the Risk Taking scale and 
the number of activities checked as enjoyed. 

Both the rare and total achievement scales 
in graphic art were more highly related to 
the parental attitudes and value measures than 
were any of the other criteria. Although the 
two achievement scales were correlated with 
different parental variables, the content of 
these correlates is consistent in suggesting 
that the parents of the artistic achiever en- 


20 Rosert C. NicHors AND Jonn L. HOLLAND 


TABLE 15 


CORRELATES OF RARE ACHIEVEMENT IN GRAPHIC Art DURING THE First YEAR OF COLLEGE 


elati i 
Variable LOGIE Variable AE oenen 
Males Females Males Females 
Standard error of r ll AS Predictors—continued 
Other criteria, Parental attitudes and values 
Leadership achievement 25 08 nemu 
eai D . 
$ 3 ; 150. Mother’s desire that the 
et evem a rt cele d child be a good student —.30 —.18 
AA Serta 152. Mother’s desire that the 
(item overlap) 2 11 child be independent and 
Total artistic achievement lf-reli g 
(item overlap) 92 65 sei US =p i 
142. Father’s desire that the 
Predictors child be happy and well- 
Originality and artistic interest: adjusted meat 
righ ity and artis 1 sts . ae, 
Gun Avtstial achiavenentiin Lack of anye and dependability 
high school .20 32 73. Drive to achieve, mother’s 
8. Breadth of creative rating =23 —33 
activity .28 33 72, Dependability, mother’s 
10. Originality of free-response rating —03 —38 
comment on a social 93. Dependability, father’s 
problem 37 —.08 rating —04 —.38 
16. Preconscious Activity scale .25 15 3. High school rank —.32 10 
31. Perceptive versus Judging 11. Femininity scale 27 23 
scale ahiz 48 79. Independence, mother’s 
37. Preference for artistic rating —.28 —.03 
occupations 28 40 Other predictors 
42. Desire to emulate the 43. Number of activities 
lives of scientists — —.25  —.06 enjoyed —.08 43 
47. Originality, self-rating 28 38 92. Athletic ability, father’s 
$ rating —.27 18 
Parental attitudes and values 27. Risk Taking scale ‘03 63 
119. Father’s preference for 133. Difference between parents 
p p: 
conventional occupations —.22 —.10 in laissez-faire child rear- 
148. Mother’s desire that the ing attitudes (father minus 
child be curious 30 —.05 mother) —.08 —.40 


courage independence and lack of conformity. 
The Graphic Art Achievement scale was 
positively correlated with the father’s Laissez- 
Faire Child Rearing Attitude scale scores and 
negatively related to the father’s Intolerance 
of Ambiguity scale scores. The Graphic. Art 
Achievement scale was negatively related to 
both mother’s and father’s expressed desire 
that their child be dependable and reliable, 
and it was positively related to the father’s 
desire that the child be independent and self- 
reliant. All these correlations were significant 
for both sexes. For female subjects graphic 
art achievement was negatively related to the 
father’s desire that his daughter be a good 
student and that she be curious. 

For both sexes the Rare Graphic Art 
Achievement scale was negatively related to 


the mother’s and father’s desire that their 
child be happy and well-adjusted; for male 
subjects it was negatively related to the 
mother’s goal that he be popular. For female 
subjects the father’s laissez-faire attitude 
score, when considered relative to the mother’s 
score, was predictive of rare graphic art 
achievement. 


Total Artistic Achievement 


The total Artistic Achievement scale was 
the sum of all the items for the dramatic, 
literary, musical, and graphic art scales. Since 
these scales all deal with artistic accomplish- 
ment and since they were positively cor- 
related, it was considered meaningful to 
score these items as a single scale. The score 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 21 


distribution on this total scale was fairly 
symmetrical so it was not dichotomized or 
otherwise transformed. The total Artistic 
Achievement scale was highly correlated 
with all of the other artistic scales with 
the exception of the literary scales, but 
these correlations were spuriously high due 
to the item overlap. For this reason the low 
correlations of the literary scales with the 


total Artistic Achievement scale are surprising. 

The correlates of the total Artistic Achieve- 
ment scale are shown in Table 16. Artistic 
achievement was more highly related to the 
predictor variables in this study than any 
of the other achievement criteria. Forty-three 
predictors had significant correlations. The 
correlates of the total Artistic Achievement 
scale seem to fall into three categories: 


TABLE 16 


CORRELATES OF THE TOTAL ARTISTIC ACHIEVEMENT SCORE 


Variable Correlation Variable Correlation 
Males Females Males Females 
Standard error of r .06 .06 Predictors—continued 
Other criteria Originality and artistic interests 
Leadership achievement 26 18 —continued i 
Achievement in dramatics 44:5 Number CEES 
(item overlap) .70 48 activities enjoyed f .20 15 
Literary achievement 47. Originality, self-rating — 19, -16 
(item overlap) 27 (22 68. Originality, mother’s rating 11 oan 
Musical achievement 89. Originality, father’s rating 14 18 
(item overlap) 78 82 61. Expressiveness, self-rating 17 24 
Rare musical achievement Social dominance 
Citem, overlap) $ ae 3 7. Elective offices in high 
Achievement in graphic art school 02 17 
guan overlap) b; 20 :94 13. Social Presence scale Ol 19 
mare achievement an 19. Sense of Destiny scale .08 .20 
graphic art (item 24. Dominance scale nT 
ovalan) r A RES 27. Risk Taking scale EET 
Predictors 28. Tater versus Extra- R A 
iginali PS version scale -| -. 
Rete ani SSI or 48. Leadership, self-rating o6 25 
5. Artistic achievement in 22 90. Leadership, father’s rating .04 23 
y mus ponl ES 29 $4, Sociability, self-rating o7 18 
` activity 32 26 Other predictors 
10. Originality of free- 3. High school rank —.19 .04 
20 —.05 11. Femininity scale 19 —.03 


response comment 
9. Length of free-response 
comment 16 —o0; 
16. Preconscious Activity scale .24 1 
20. Intolerance of Ambiguity 


scale —.17 —.13 
21. Complexity-Simplicity 

scale 19 .06 
30. Feeling versus Thinking 

scale 15 10 
31. Perceptive versus Judging 

scale -16 
37. Preference for artistic 

occupations 29 .20 
35. Preference for conventional 

occupations —13  —.17 


42. Desire to emulate the lives 
of scientists —17 —.04 


51. Dependability, self-rating —15 —.02 
64. Self-understanding, self- 


rating 18 06 
85. Self-understanding, 

mother’s rating — 14 13 
79. Independence, mother’s 

rating — 13 17 
92. Athletic ability, father’s 

rating —.17 .10 
114. Educational difference 

between parents (father 

minus mother) —.19 10 
131. Father’s laissez-faire child 

rearing attitudes 07 AS 
134. Father’s intolerance for 

ambiguity —.08 21 
112. Father’s education 18 —.01 


22 Rosert C. NıcHoLs AnD Joun L. HOLLAND 


measures of originality and sensitivity, meas- 
ures of artistic interest and accomplishment, 
and measures of general activity and social 
dominance. 

The highest single predictive relationship 
for both sexes was with breadth of creative 
activity in high school. Other measures of 
originality and sensitivity which were sig- 
nificantly related to artistic achievement for 
both sexes were all three ratings of origi- 
nality, the self-rating of expressiveness, the 
Preconscious Activity scale, and the number 
of creative activities enjoyed. The Intoler- 
ance of Ambiguity scale was negatively related 
to artistic achievement for both sexes. For male 
subjects the judged originality of the free- 
response comment and the Complexity versus 
Simplicity and Feeling versus Thinking scales 
was positively related to artistic achieve- 
ment. For female subjects the Perceptive 
versus Judging scale was similarly related. 

Among the measures of artistic interest 
and accomplishment, the highest correlations 
for both sexes was with a similar scale of 
artistic achievement in high school. Other 
measures which were significant for both sexes 
were: expressed preference for artistic occu- 
pations and rejection of conventional occupa- 
tions. For male subjects there was a negative 
relationship with the desire to emulate the 
lives of scientists. 

The correlates of artistic achievement 
which appear to be measures of social domi- 
nance were for the most part significant only 
for the female sample. These variables include 
the number of elective offices held in high 
school; self-ratings and father’s ratings of 
leadership; self-ratings of sociability; and the 
Social Presence, Sense of Destiny and Risk 
Taking scales. The Introversion versus Ex- 
traversion scale was negatively correlated 
with artistic achievement for the female 
sample. 

Although the Dominance scale was posi- 
tively related to artistic achievement for both 
sexes, a number of other predictors suggest 
that the social dominance which is positively 
related to artistic achievement among females 
is negatively related to artistic achievement 
among males. In the male sample artistic 
achievement was positively related to the 


Femininity scale and negatively to high school 
rank and the self-rating of dependability; for 
males the mother’s rating of independence and 
the father’s rating of athletic ability were also 
negatively related to artistic achievement, al- 
though both these variables were positively 
related to this criterion in the female sample. 


PhD Aspiration 


Stated intention to seek the PhD when ob- 
tained at the end of the freshman year at 
college is a measure of scholastic motivation 
still largely untested against the trials of col- 
lege education. Undoubtedly many students 
will change their plans before actually ob- 
taining the degree. However, PhD aspira- 
tion at the end of the freshman year gives 
some indication of which bright students 
start out with intentions of obtaining PhD, 
and it is interesting to see how they differ 
from bright students who do not. PhD 
aspiration was dichotomized as those seeking 
the PhD versus those seeking a lower degree. 
Students seeking medical, law, or other profes- 
sional degrees were omitted from the calcula- 
tions. 

PhD aspiration at the end of the fresh- 
man year at college was highly correlated 
with a similar measure obtained at high school 
graduation. The tetrachoric coefficients were 
.89 for males and .84 for females. However, 
it seems desirable to use the measure obtained 
at the end of the freshman year, since this 
measure is closer in time to the actual PhD 
criterion and since all relationships are pre- 
dictive. 

The correlates of PhD aspiration are shown 
in Table 17. PhD aspiration was related to 
aptitude even in this highly select sample. It 
was correlated with SAT-Verbal for both 
males and females and with SAT-Mathemati- 
cal at the .05 level for males. 

Several of the correlates suggest that as- 
pirants have greater motivation to achieve 
and that they achieve a higher level than 
those who plan to get lower degrees. PhD 
aspiration was related to college grades, to 
scientific and literary achievement in college, 
to scientific achievement in high school, and 
to the desire to emulate scientists. The Per- 
sistence and Mastery scales were correlated 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 23 


TABLE 17 
CORRELATES OF PHD ASPIRATION AT THE END OF THE First YEAR OF COLLEGE 
Variable Correlation Rett Correlation 
Males Females Males Females 
Standard error of r 09 -09 Predictors—continued 
Other criteria Originality and creativity 
First-year grades 34 14 Say ed $ 4 
Scientific achievement AL 23 a ects aera Het A ae 
i i e » Mi S ral . ` 
Literary achievement 34 28 89. Originality, father’s rating 20 ‘ot 
Predictors 61. Expressiveness, self-rating .14 28 
Achievement motivation and Vocational interests 
pele 32. Preference for realistic 
41. PhD aspiration in high occupations EN 08 
school 89 84 33. Preference for intellectual 
6. Scientific achievement in occupations 36 26 
high school 15 17 34. Preference for social 
22. Mastery scale 20 25 occupations 6 t— 25 
23. Deferred Gratification 35. Preference for conven- 
scale 08 .21 tional occupations —.23 —.24 
14. Persistence scale 29 30 36. Preference for enterprising 
52. Drive to achieve, self- occupations —.13 —.22 
rating 33 .22 37. Preference for artistic 
73. Drive to achieve, mother’s occupations AS 29 
rating .23 .08 42. Desire to emulate the 
53. Scholarship, self-rating 27 03 lives of Scientists 16 -24 
95. Scholarship, father’s rating .22 —.03 Other predictors 
e ae (to el ea zoa 
A ping ? OMA Aot 11. Femininity scale 24 —.06 
Pi x : $ 26. Radicalism scale .19 14 
107. erseverance, father’s 43, Number of activities 
rating 15 18 enjoyed 20 14 
inat svi 45. Number of self-ratings 
Originality and BEAN E SE E A 18 419 
8. Bread ee 64, Self-understanding, self- 
activity 18 36 rating .20 18 
10. Originality of free- 106. Self-understanding, father’s 
response comment —.01 26 rating 10 .25 
16. Preconscious Activity 81. Practical-mindedness, 
scale 25 40 mother’s rating —.20 —.14 
20. Intolerance of Ambiguity 114. Educational difference 
scale —18 —.25 between parents (father 
21. Complexity-Simplicity minus mother) 25/007 
scale 14 35 117. Father’s interest in intel- 
29, Intuition versus Sensation lectual occupations 22 10 
scale 14 37 131. Father’s laissez-faire child 
30. Feeling versus Thinking rearing attitudes 23 18 
scale 05 —.21 138. Father’s desire that the 
40. Course most enjoyed was child be ambitious 03 24 
one which stimulated 133, Parental difference on 
creative thinking —.18 32 laissez-faire child rearing 
44. Number of creative attitudes (father minus 
activities enjoyed 24 30 mother) 19 10 


with PhD aspiration as were self-ratings of 
drive to achieve and perseverance, and father’s 
rating of perseverance. 

That PhD aspiration is also related to 
originality, and intuitive sensitivity is indi- 


cated by its positive correlations with the 
Preconscious Activity, Complexity-Simplicity, 
Radicalism, and Intuition versus Sensation 
scales with self-ratings of originality, self- 
understanding, and expressiveness, and with 


24 Rosert C. Nicnors anp Joms L. Horranp 


high father’s rating of originality; and by 
its negative correlation with the Intolerance 
for Ambiguity scale, and mother’s rating of 
practical mindedness. 

The correlation of PhD aspiration with 
breadth of creative activities, number of ac- 
tivities enjoyed, and number of creative ac- 
tivities enjoyed suggests that the PhD 
aspirant tends to have wider interests than 
the nonaspirant. 

There are some sex differences apparent 
in the correlates of PhD aspiration. Male 
PhD aspirants tend to be feminine, as indi- 
cated by the Femininity scale, and more ori- 
ented toward scholarship, as indicated by self-, 
mother’s, and father’s ratings of scholarship, 
than were those aspiring to lower degrees. 

Female PhD aspirants seem to be more 
original, as indicated by the judged originality 
of the free-response comment and their re- 
port that the high school class they most en- 
joyed was one which stimulated creative 
thinking. They also scored higher on the De- 
ferred Gratification scale and lower on the 
Feeling versus Thinking scale than the as- 
pirants to lower degrees. 

Interest in intellectual occupations was 
positively correlated with PhD aspiration for 
males and negatively for females, while in- 
terest in social occupations was negatively re- 
lated to PhD aspiration for males and posi- 
tively for females. It seems probable that if 
one’s interests in particular occupations are 
appropriate to roles traditional for one’s sex, 
one is more likely to aspire to higher educa- 
tional levels in that particular occupation. 


Effectiveness of the Predictor Variables 


The predictors used in the present study 
varied greatly in the degree to which they 
were related to subsequent achievement. 
Future predictive studies may benefit from a 
consideration of the type of predictor which 
proved most effective for the present criteria. 

Table 18 shows the percentage of the pre- 
dictor-criterion correlations which were sig- 
nificant for the various groups of predictors. 
About 2% of the correlations in each group 
of predictors might be expected to attain the 
required level of significance by chance. As 
can be seen from Table 18, in the poorest 


TABLE 18 


PERCENTAGE OF SIGNIFICANT CORRELATIONS OF THE 
Various TYPES OF PREDICTOR VARIABLES WITH 
THE ACHIEVEMENT CRITERIA 


Percentage of 

Type of predictor variable correlations 

significant & 
High school achievement 33 
Interests and attitudes 27 
Personality inventory scales 24 
Self-ratings 12 
Mother’s ratings 14 
Father’s ratings 14 
Parental attitudes 10 
Family background variables 8 
Parental interests 7 

Parental goals and aspirations for the 

student 7 
Aptitude test T 


® Significant at .01 level for either sex or at .05 level for 
both sexes. 


group of predictors, 7% of the correlations 
were significant; thus, all groups of predictors 
were at least minimally related to the cri- 
teria. 

The best group of predictors were those 
based on high school achievement: 33% of 
the correlations were significant. Next in ef- 
fectiveness were the interest and attitude 
measures. The number of creative activities 
enjoyed was a very effective predictor, as was 
the item concerned with emulating scientists. 
Preferences for occupations were also fairly 
good predictors of achievement. Preference 
for artistic occupations was particularly ef- 
fective in predicting artistic achievement. 

The personality inventory scales were next 
in order of effectiveness; and no scale failed 
to correlate significantly with at least one of 
the criteria. 

After the inventory scales there was a large 
drop in effectiveness to the rating variables. 
The mother’s and father’s ratings were slightly 
better predictors than the self-ratings by the 
student. 

Last came measures of parents’ interests, 
attitudes and goals; family background vari- 
ables, and aptitude measures. 


Possible Causative Relationships 


Although the parental attitude and back- 
ground variables were much less closely re- 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 25 


lated to the achievement criteria than were 
the high school achievement and personality 
variables, they were related far beyond chance 
expectancy. From a theoretical point of view, 
they are of greater importance than the other 
more highly correlated predictors because 
they point to possible causative factors which 
determine achievement. Thus, it is relevant 
to attempt to find patterns in the correla- 
tions of these variables with the achievement 
criteria. 

Generally, parental preferences for realis- 
tic, conventional, and social occupations were 
negatively related to artistic achievement; 
these relationships were more pronounced for 
the father than for the mother. MacKinnon 
(1960) and Holland (1962) have shown that 
people with these occupational interests tend 
to be practical and uninterested in esthetic, 
theoretical, and intellectual matters. It may 
be that their lack of interest in these matters 
inhibits the development of artistic talent in 
their children. 

Laissez-faire child rearing attitudes on the 
part of the parents were, in general, positively 
related to artistic achievement. Authoritarian 
attitudes tended to be negatively related to 
artistic achievement, with the exception that 
authoritarian attitudes in the father were posi- 
tively related to rare literary achievement for 
girls. Intolerance for ambiguity was, in general, 
negatively related to artistic achievement in 
boys and positively in girls, with the excep- 
tion that it was negatively related to graphic 
art achievement in girls. 

The parents’ desire that the child be con- 
forming or possess socially desirable traits 
(happy and well adjusted, dependable and 
reliable, curious, good student, and self-con- 
trolled) was, in general, negatively related to 
achievement. On the other hand, the parents’ 
desire that the child be independent and ag- 
gressive (independent and self-reliant, able to 
defend self, and ambitious) was, in general, 
positively related to achievement. 

The educational level of the parents also 
appears to be positively related to achieve- 
ment. If the father’s education was high 
relative to that of the mother, the achieve- 
ment of boys tended to be greater, whereas 
achievement in girls, depended on the 


mother’s having a high education relative to 
that of the father. 

Two major themes seem to run through 
the correlations discussed above. First, when 
the parents are not interested in a particular 
area (specifically art in this study) the child’s 
achievement in that area is inhibited, Sec- 
ondly, parental press for independence is 
positively related to achievement and parental 
press for conformity is negatively related. 

One surprising finding is that the variables 
obtained from the father were more highly 
related to the achievement criteria than were 
those obtained from the mother. The 31 sig- 
nificant correlations of father’s responses with 
the achievement criteria were significantly 
greater than the 16 significant correlations 
obtained with mother’s responses (x? = 4.74, 
p < .05). Although many people have as- 
sumed that the attitudes and values of the 
mother are more significant in the develop- 
ment of the child, there is little evidence on 
this point. 

There was a tendency for the achievement 
of males to be more predictable than that 
of females. Of the correlations which were 
significant for at least one sex the correlation 
for the male sample was larger than that for 
the female sample 57% of the time, a figure 
which is significantly different than 50% 
(p < .05). However, the correlations of the 
predictors with dramatic and literary achieve- 
ments tended to be higher for females. 


Muitiple Correlations 


Samples of 322 male and 164 female sub- 
jects for whom complete predictor and cri- 
terion data were available were selected from 
the total pool of subjects. Using these sam- 
ples the 39 most promising predictor varia- 
bles and the criterion measures for grades, 
leadership, and scientific and artistic achieve- 
ment were intercorrelated. Using the Wherry- 
Doolittle procedure, multiple correlations 
were calculated between the best subset of 
predictor variables and each of the four cri- 
teria. These calculations were done sepa- 
rately for males and females. For these calcu- 
lations the leadership and science criteria 
were dichotomized and the coefficients are 
multiple biserial correlations. 


26 Rosert C. NicHors anD Jonn L. HOLLAND 


Leadership. The multiple biserial correla- 
tions with the leadership criterion for males 
and females are shown in Tables 19 and 20, 
respectively. There were four significant pre- 
dictors in the male sample and three in the 
female sample. The number of activities en- 
gaged in was in the predictive equation for 
both sexes. After this variable, however, 
inventory measures of socialization and extra- 
version added predictive variance for the fe- 
males, but did not appear at all in the male 
analysis. Among males, interest in realistic 
occupations and the Complexity-Simplicity 
scale entered the equation with negative 
weights and the Social Presence scale with a 
positive wéight. Since the zero-order corre- 
lation of the Complexity-Simplicity scale 
with leadership was not significant, it prob- 
ably entered the equation in the male sample 
as a suppressor variable to identify those 
students with many activities who are not 
interested in realistic occupations, yet who are 
not leaders. 

Science. The multiple biserial correlations 
with scientific achievement are shown in 
Tables 21 and 22 for males and females, re- 
spectively. For males the best single predictor 
of scientific achievement was achievement in 
science in high school. After variance due to 
high school achievement was removed, scien- 
tific achievement was predictable from judged 


TABLE 19 


MULTIPLE CORRELATIONS WITH THE LEADERSHIP 
CRITERION OF THE BEST SUBSETS OF PREDICTORS 


(N = 322 males) 


TABLE 20 


MULTIPLE CORRELATIONS WITH THE LEADERSHIP 
CRITERION OF THE Best SUBSETS OF PREDICTORS 


(N = 164 females) 


Significance 
Direc- of reduction Multiple 
Predictor added to equation tion of of residual ESP = 
weight variance tion® 
F 
12. Socialization scale + 9.9* Bi 
44. Number of creative 
activities + 11.8* 45 
28. Introversion versus 
Extraversion scale = 6.1* 50 
7. Offices held in high : 
school + 3.5 54 


a The multiple correlation beside each variable is that ob- 
tained for that variable in combination with all the variables 
listed above it. 

*p< 0l. 


originality of free-response comment, PhD 
aspiration, and preference for realistic occupa- 
tions. Among females the best single predictor 
of scientific achievement was interest in realis- 
tic occupations. When variance due to this 
variable was removed the Sense of Destiny 
scale and PhD aspiration added significant 
variance to the predictions. One caution 
should be mentioned with regard to the multi- 
ple correlations for science achievement. Since 
scientific achievement, as assessed in this 
study, occurred very rarely, especially among 


TABLE 21 


MULTIPLE CORRELATIONS WITH THE SCIENCE CRITE- 
RION OF THE BEST SUBSETS OF PREDICTORS 


(N = 322 males) 


Significance Significance ji 
Direc- of reduction Multiple Direc- of reduction Multiple 
Predictor added to equation tionof of residual correla- Predictor added to equation tionof of residual correla- 
weight variance tiona weight variance tion® 
F iB 
44. Number of creative 6. Scientific achievement 
activities + 15.8** 28 in high school + 28.9** AS 
32. Preference for realistic 10. Originality of free- 
occupations — 9.5** 34 response comment + 9.8** Sa 
21. Complexity-Simplicity 41. PhD aspiration in 
scale — 9.7** 40 high school + 4.7* 54 
13. Social Presence scale + 5.1* 43 32: Preference for realistic 
31. Perceptive versus occupations + 4.6* 58 
Judging scale + 24 44 12. Socialization scale = 2.9 59 


a The multiple correlation beside each variable is that ob- 
tained for that variable in combination with all the variables 
listed above it. 

*p< 05. 

*p < Ol. 


= The multiple correlation beside each variable is that ob- 
tained for that variable in combination with all the variables 
listed above it. 

*p < .05. 

**p< Ol. 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 27 


TABLE 22 


MULTIPLE CORRELATIONS WITH THE SCIENCE CRITE- 
RION OF THE BEsT SUBSETS OF PREDICTORS 


(N = 164 females) 


Significance 
Direc- of reduction Multiple 


Predictor added to equation tion of of residual correla- 
weight variance tion® 
F. 

32. Preference for realistic 

occupations + 7.8** A2 
19. Sense of Destiny scale + 10.1** .64 
41. PhD aspiration in 

high school + 3,9* 70 
2. SAT-Mathematical + 22 14 


“The multiple correlation beside each variable is that ob- 
tained for that variable in combination with all the variables 
listed above it. 


p 05. 
**p< 01. 


the female subjects, the standard error of the 
biserial coefficient is quite high. Since the 
variable-selection procedure used tends to 
capitalize on any spuriously high relation- 
ship, these coefficients could be expected to 
shrink more than the multiple correlation 
with other criteria on cross validation. 

Art, The multiple correlations with artis- 
tic achievement are shown in Tables 23 and 


TABLE 23 


MULTIPLE CORRELATIONS WITH THE ARTISTIC 
ACHIEVEMENT CRITERION OF THE BEST 
SUBSETS OF PREDICTORS 


(N = 322 males) 


Significance 
Direc- of reduction Multiple 
Predictor added to equation tion of of residual correla- 
weight variance tion® 
F 
37. Interest in artistic 
occupations + 55,0** 38 
44, Number of creative 
activities + 19.8** 44 
5. Artistic achievement 
in high school + 12.2** 48 
3. High school rank — 5.4* 49 
10. Originality of free- 
response comment + 4,0* 50 
42. Desire to emulate the 
lives of scientists = 3.0 50 


a The multiple correlation beside each yariable is that ob- 
tained for that variable in combination with all the variables 


listed above it. 
*p < .05. 
**p < ol 


TABLE 24 


MULTIPLE CORRELATIONS WITH THE ARTISTIC 
ACHIEVEMENT CRITERION OF THE BEST 
SUBSETS OF PREDICTORS 


(N = 164 females) 


Significance $ 
EA Ee AT T PRG 
weight variance tion® 
F 
37. Interest in artistic 
occupations a 25.7% 37 
19. Sense of Destiny scale + 10.3** 43 
47. Originality, self-rating + 7.6%" 47 
31. Perceptive versus 
Judging scale + 5:74 50 
44, Number of creative A 
activities + 4.6* 52 
41. PhD aspiration in 
high school -= 33 54 


«The multiple correlation beside each variable is that ob- 
tained for that variable in combination with all the variables 
listed Sat 


$ 05. 
**p < Ol. 


24, for males and females, respectively. In 
both sexes the best single predictor was ex- 
pressed interest in artistic occupations. When 
variance due to this source was removed the 
best predictors for males were breadth of 
activity and artistic achievement in high 
school, For females the best predictors were 
the Sense of Destiny scale and the self-rating 
of originality. 

Grades. The multiple correlations with 
grades are shown in Tables 25 and 26 for 
males and females, respectively. The best 
single predictor for males was the self-rating 
of scholarship and for females was high school 
rank. These two variables were highly cor- 
related with each other and it was probably 
a chance occurrence that one was selected as 
the first variable in the male analysis and 
the other in the female analysis. Once one 
was selected the other tended to not be se- 
lected because of its high correlation with the 
first selected predictor. For males the next 
best predictors were the Femininity scale and, 
with a negative weight, the Perceptive versus 
Judging scale. For females, the second and 
third selected predictors of grades were the 
Persistence scale and PhD aspiration. 

Tt is interesting to note that the best pre- 
dictors of all criteria for both males and fe- 


28 Rosert C. Nicwors anp Jonn L. HOLLAND 


TABLE 25 


MULTIPLE CORRELATION WITH COLLEGE GRADES OF 
THE Best SUBSETS OF PREDICTORS 


(N = 322 males) 


TABLE 27 


MULTIPLE CORRELATION OF THE BEST SUBSETS OF 
PREDICTORS WITH THE VARIOUS CRITERIA 
CORRECTED FOR SHRINKAGE 


Direc. Siftificance > 
Predictor added to equation Honei direti tiple 
weight variance tion 
F 
52. Scholarship, self-rating + 413** 34 
11. Femininity scale + 24,1** 42 
31. Perceptive versus 
Judging scale — 13.7** 46 
3. High school rank + 7.0** 48 
18. Acquiescence scale + 5.2* 49 
14. Persistence scale + 4.0* 50 
5. Artistic achievement 
in high school + 3.9* 51 
6. Scientific achievement 
in high school + 2.7 51 


*The multiple correlation beside each variable is that ob- 
tained for that variable in combination with all the variables 
listed above it. 

*p < 05. 
in Breen OLS 


males were measures of either past perform- 
ance, personality, or interests. The SAT did 
not contribute to any of the regression equa- 
tions. This of course, would not be likely in 
a group less homogeneous in aptitude than 
this one. 

Correction for Shrinkage. The multiple 
correlations shown in Tables 19 through 26 


TABLE 26 


MULTIPLE CORRELATION WITH COLLEGE GRADES oF 
THE Best SUBSETS OF PREDICTORS 


(N = 164 females) 


p Significance 
‘ Direc- of reduction Multiple 
Predictor added to equation tion of of residual correla- 


weight variance tion® 
F 
3. High school rank + 9.7** 24 
14. Persistence scale + 6.9** 31 
41, PhD aspiration in 
high school ae 5.4* 35 
37. Interest in artistic 
occupations + 3.9* 38 
15. Superego scale + 5.3* 42 
22. Mastery scale — 6.0* 45 
2. SAT-Mathematical + 4.4* 47 
53. Scholarship, self-rating -+ 37 50 


à The multiple correlation beside each variable is that ob- 
tained for that variable in combination with all the variables 
listed above it. 

*p< 05. 

**p < 01. 


Multiple R 
corrected for 
shrinkage (from 


‘5 estimated. 
Criterion Multiple R multiple R 
(all significant with 39 
predictors) predictors) 
Males Females Males Females 
Grades 31 47 44 1S 
Leadership achievement .43 50 33 25 
Scientific achievement .58 -70 54 63 
Artistic achievement 50 52 43 30 


represent the best correlations obtainable 
from the 39 predictors included in the analy- 
sis. In each case the maximum correlation 
obtainable from all predictors is approached 
closely by 8 or fewer predictors. Since these 
multiple correlations capitalize on any chance 
relationship among the criteria and the 39 
predictors it is to be expected that they would 
shrink when the predictive equation was ap- 
plied to a new sample with different chance 
relationships. The amount of shrinkage to be 
expected was calculated by the Wherry 
shrinkage formula. The multiple correlations 
corrected for expected shrinkage are shown 
in Table 27. 

Since the formula for shrinkage assumes 
that all 39 predictors are in the regression 
equation, each coefficient obtained with 8 
predictors was increased by .03 as an esti- 
mate of the coefficient obtainable with all 39 
predictors. These values were then corrected 
for shrinkage. Because of the smaller N, the 
multiple correlations for females were gen- 
erally higher than those for the male sample, 
yet tended to be lower after correction for 
shrinkage. The extraordinarily high multiple 
R obtained for science achievement in the 
female sample may be spuriously high, since 
there were very few female science achievers 
and the standard error of the biserial is rela- 
tively large. 


Discussion 


The nature of the present sample limits the 
generality of the findings of this study. In a 
sample less homogeneous in aptitude than this 
one, ability measures would be much more 


APTITUDE AND First YEAR COLLEGE PERFORMANCE 29 


prominent in the list of predictors. Also it 
is likely that many of the nonintellective 
predictors would have different relationships 
with the achievement criteria in a sample of 
students of average aptitude. However, the 
present sample represents precisely that group 
for which nonintellective predictors of achieve- 
ment are most badly needed. Aptitude tests 
will make good discriminations between un- 
selected students. The present study, how- 
ever, attempts to discriminate among stu- 
dents who are already highly selected on 
aptitude measures. Judged by conventional 
standards of predictability the coefficients of 
30s and .40s obtained in this study are not 
very impressive. However, when one con- 
siders that these coefficients were obtained 
from nonintellective predictors after severe 
selection on aptitude had already been made, 
they become more impressive. Such discrimi- 
nation could be helpful in choosing among 
applicants for scholarships, fellowships, and 
high level positions, since such applicants 
tend to be high in scholastic aptitude. 
Achievements of various kinds were not 


highly related and were often predicted by 
different sets of predictors. To the degree 
that predictors are applicable only to particu- 
lar criteria, their usefulness for a selection 
program with broad objectives is limited. One 
might not want to use a predictor—however 
valid for a particular criterion—if one were 
interested in a wider range of criterion per- 
formances. From this point of view, the 
measures of creativity and originality and of 
extraversion and social dominance are par- 
ticularly valuable since they seem to be re- 
lated to a variety of artistic and leadership 
criteria. 

Without exception, the results lend con- 
struct and predictive validity to the person- 
ality, originality, and attitude scales de- 
veloped by, or from the ideas of, Barron 
(1953, 1955), Budner (1959), Torrance and 
Ziller (1957), Gough (1957), Kubie (1958), 
and MacKinnon (1960). These scales, despite 
their divergent theoretical orientations are 
moderately successful in predicting artistic 
achievement. 


REFERENCES 


Barron, F. Complexity-simplicity as a personality 
dimension, J. abnorm. soc. Psychol., 1953, 48, 
163-172. 

Barron, F. The disposition towards originality. J. 
abnorm. soc. Psychol., 1955, 51, 478-485. 

Bupner, S. Intolerance of ambiguity as a factor in 
medical education. Amer. Psychologist, 1959, 14, 
345. (Abstract) 

Cartett, R. B., Saunners, D. R., & Srce, G. Hand- 
book for the Sixteen Personality Factor Question- 
naire. Champaign, Ill.: Institute for Personality 
and Ability Testing, 1957. 

Coucs, A., & Kentston, K. Yeasayers and naysay- 
ers: Agreeing response set as a personality varia- 
ble. J. abnorm. soc. Psychol., 1960, 60, 151-174. 

Goucs, H. G. Researchers’ summary for the Differ- 
ential Reaction Schedule. University of California, 
Berkeley, 1957. 

Hortanp, J. L. Prediction of scholastic success for 
a high aptitude sample. Sch. Soc., 1958, 86, 290- 
293. 

Hortanp, J. L. The prediction of college grades from 
the California Psychological Inventory and the 
Scholastic Aptitude Test. J. educ. Psychol., 1959, 
50, 135-142. 

Hottanp, J. L. The prediction of college grades 
from personality and aptitude variables. J. educ. 
Psychol., 1960, 51, 245-254. 

Hortranp, J. L. Creative and academic performance 


among talented adolescents. J. educ. Psychol., 
1961, 52, 136-147. 

Hortanp, J. L. Some explorations of a theory of 
vocational choice: I. One- and two-year longitu- 
dinal studies. Psychol. Monogr., 1962, 76(26, Whole 
No. 545). 

Hoxtanp, J. L, & Astin, A. W. The prediction of 
the academic, artistic, scientific, and social achieve- 
ment of undergraduates of superior scholastic ap- 
titude. J. educ. Psychol., 1962, 53, 132-143. 

Kus, L. S. Neurotic distortion of the creative 
process. Lawrence: Univer. Kansas Press, 1958. 

MacKinnon, D. W. What do we mean by talent and 
how do we test for it? In, The search for talent: 
College admissions 7. New York: College Entrance 
Examination Board, 1960. 

Nicnots, R. C., & Davis, J. A. Some characteristics 
of students of high academic aptitude. Personnel 
guid. J., 1963, in press. 

Roxeacu, M. Political and religious dogmatism, an 
alternative to the authoritarian personality. Psy- 
chol. Monogr., 1956, 70(18, Whole No. 425). 

Srricxer, L. J., & Ross, J. A description and evalua- 
tion of the Myers-Briggs Type Indicator, Princeton, 
N. J.: Educational Testing Service, 1962. 

Torrance, E. P., & ZILLER, R. C. Risk and life ex- 
perience: Development of a scale for measuring 
risk taking tendencies. USAF PTRC tech. Rep, 
1957, No. 57-23. 


(Received March 16, 1963) 


Vol. 77, No. 8 


Whole No. 571, 1963 


Psychological Monographs: General and Applied 


ce nN 


PROPOSED MODEL OF EGO FUNCTIONING: 
COPING AND DEFENSE MECHANISMS IN RELATIONSHIP TO IQ CHANGE * 


NORMA HAAN 
Institute of Human Development, University of California, Berkeley 


f 


A model of ego functioning that includes both coping and defense mechanisms 
is described and its use is demonstrated in a study that reports the relationships 
of ego functioning to absolute intelligence and to IQ change from early adoles- 
cence to middle adulthood. The ego mechanism measures were developed from 
ratings of extensive interviews of the adult sample of the Oakland Growth 
Study. IQ change measures were derived from the Terman Group Test and 
results are reported for change in the total intelligence score as well as in the 
arithmetic and the verbal scores. Sex differences, social class, and sex role 
typing are likewise considered. The general results suggest that coping is related 
to, and presumably leads to, IQ acceleration and defense to IQ deceleration, 


Te céiiceptualization of ego functioning 
requires, besides recognition of defense 
mechanisms, explicit and systematic consid- 
eration of the role of coping mechanisms. 
Although common agreement exists among 
psychologists as to the characteristics of de- 
fense mechanisms, attempts to deal with the 
nature and systematics of healthy behavior 
have produced much less clarity. Two com- 
plications immediately arise when psycholo- 
gists consider healthy behavior. The first 
involves the difficulties inherent in the utiliza- 
tion of social criteria of health: health in 
one culture may be sickness in another. The 
second difficulty (not always separate from 
the first) occurs when the definition of health 
is tied to a favored drive or goal; here one 
psychologist’s preference may be, and often 
is, another’s indifference. If processes, for- 
mally: defined and specifically isolated from 
drive or cultural content, are used as criteria 


1 The subjects for this study are from the longi- 
tudinal Oakland Growth Study (OGS). The collec- 
tion of adult material and the subsequent data anal- 
ysis for this study was first supported by the Ford 
Foundation and later by Grant M-5300 from the 
National Institute of Mental Health, United States 
Public Health Service. 

The author wishes to thank J. Clausen and M. B. 
Smith for their evaluative and stimulating comments 
on various versions of this paper and to make grate- 
ful acknowledgment to J. Block for suggestions on 
content and methodology. 


some of these parochial difficulties can be 
avoided, 

This paper assumes that the mental proc- 
esses involved in the various coping mecha- 
nisms and the classical defense mechanisms 
are identical. It is on the basis of the prop- 
erties, rather than the process of each that 
the distinction between coping and defense 
mechanisms is drawn.? ‘These properties seem 
commonly but implicitly used to distinguish 
behaviors that are essentially coping from 
those that are essentially defensive in na- 
ture. The relationship of these adult ego 
functions to change in IQ from the ages of 
12-37 years will be reported in this paper as 
a means of explaining the model and evaluat- 
ing its usefulness. 

The dilemma of current research, when the 
ego’s functions are conceived as primarily 
defensive in character, is illustrated by the 
recent work of Miller and Swanson (1960). 
Both before and after stimulation which was 
intended to arouse increased need and ensu- 
ing conflict, a number of their subjects failed 
to show initial and/or subsequent defensive 
behavior. Since Miller and Swanson had no 


2The conceptual model of ego functioning em- 
ployed in this paper was worked out jointly in 1959 
by the author and T. C. Kroeber (now at San Fran- 
cisco State College) when he was a member of the 
staff of the Institute of Human Development. The 
model is as much the result of Kroeber’s work as it 
is the author’s. 


2 Norma Haan 


conceptual provision for handling nondefen- 
sive behavior, they found it necessary to 
eliminate these subjects from their analysis; 
this may have excluded from consideration an 
important segment of the range of normal 
human behavior. 

The early psychoanalytic idea that the ego 
is wrought out of conflict between the id and 
superego, and therefore is defensive in func- 
tion, has evidently had a pervasive effect. 
Until recently psychoanalysis has placed upon 
sublimation—classified as a defense—the 
entire burden of explaining presumed non- 
pathological ego functioning. Fenichel (1945) 
said, “The successful defenses may be placed 
under the heading of sublimation [p. 141].” 
As late as 1957 Lampl-De-Groot asked, “Is 
‘defense’ in itself a pathological phenomenon 
or are we entitled to speak of ‘normal’ de- 
fense mechanisms and defensive processes 
[p. 114]?” Labeling sublimation as a de- 
fense mechanism would appear to be a con- 
fusion in terms, understandable in view of 
prior conceptual development, but no longer 
necessary, Recently Miller and Swanson and 
later Swanson (1961) have specified the 
mental processes that have been implicitly 
assumed to underlie defense mechanisms. 
These authors do consider “expressive styles” 
which, they indicate, refer to “individual 
variation in the manner of performing adap- 
tive acts [Miller & Swanson, 1960, p. 289].” 
However they later define expressive style as 
“a third source of restriction”’—and thus 
presumably a defensive behavior. 

A list of the properties that seem to have 
been generally, if sometimes implicitly, as- 
sumed in psychology and psychoanalysis to 
distinguish between defensive, pathological 
behavior and healthy, coping behavior is pre- 
sented below. Separate criteria will be pre- 
sented later for each of the processes of the 
10 classical defense mechanisms and the 10 
coping mechanisms which are proposed. 


Properties of a defense mechanism: 


1. Behavior is rigid, automatized, and 
stimulus bound. 

2. Behavior is pushed from the past, and 
the past compels the needs of the present. 


3. Behavior is essentially distorting of the 
present situation. 

4. Behavior involves a greater quantity of 
primary process thinking, partakes of un- 
conscious elements, and is thus undifferen- 
tiated in response, 

5. Behavior operates with the assumption 
that it is possible to remove disturbing affects 
magically. 

6. Behavior allows impulse gratification by 
subterfuge. 


Properties of a coping mechanism: 


1. Behavior involves choice and thus is 
flexible and purposive. 

2. Behavior is pulled toward the future 
and takes account of the needs of the present. 

3. Behavior is oriented to the reality re- 
quirements of present situation. @. 

4. Behavior involves secondary . process 
thinking (see Footnote 3), conscious and pre- 
conscious elements, and is highly differen- 
tiated in response. 

5. Behavior operates within the organism’s 
necessity of “metering” the experiencing of 
disturbing affects, 

6. Behavior allows forms of impulse satis- 
faction in an open, ordered, and tempered 
way. 

The model proposes, then, a differentiation 
of defensive ego functions from those that 
perform a coping function. Anna Freud 
(1937) noted in The Ego and the Mecha- 
nisms of Defense that the defenses 


have their counterpart [italics added] in the ego’s 
attempts to deal with the external danger by actively 
intervening to change the conditions of the world 
around it. Upon this last side of the ego’s activities 
I cannot enlarge here [p. 191]. 


8 Rapaport (1954b) in summarizing the assump- 
tions underlying the psychoanalytic theory of think- 
ing has noted the following distinctions between 
primary and secondary processes: primary processes 
aim at an immediate discharge of impulse, whereas 
secondary processes regulate impulse by internally 
controlled delays; primary processes aim at direct 
discharge of impulses, whereas secondary processes 
fulfill more complex reality conditions and therefore 
result in differentiated, partial discharge of impulse; 
primary processes are regulated by the pleasure prin- 
ciple, whereas secondary processes are regulated by 
the reality principle. 


Eco FUNCTIONING IN RELATIONSHIP To IQ CHANGE 3 


The concomitants of coping imply “an active 
intervention to change the world,” but the 
axis on which Anna Freud’s statement swings 
is that of outer, as opposed to inner, danger; 
that is, objective as opposed to neurotic 
anxiety. Coping processes are, however, not 
only responsive to external danger but are 
also responsive to internal danger. Coping 
mechanisms have neurosis-free functions in 
the sense that they are free to respond to 
the realities and nuances of external and/or 
internal threats, but this does not mean that 
coping behavior produces a conflictless state 
of being. On the contrary it is assumed that 
coping is actively associated with inevitable 
life conflict. In this sense being “well- 
defended” becomes a second best method of 
handling conflict. A lack, deficiency, or fail- 
ure of some or all ego functions, whether de- 
fensive or coping, might be assumed to be a 
prelude to hospitalization in our society. In 
confronting the overt psychosis, it thus would 
be pertinent to ask which defense and/or 
coping mechanisms have failed, which have 
remained in functioning order, and which 
coping or defensive functions prognosticate 
decompensation or recompensation. 

Reports from the Coping Project at the 
Menninger Foundation (Murphy, 1960, 
1962) indicate that Murphy and her associates 
have conceptualized the relationship between 
defense and coping behavior in a fashion that 
is similar to that used in the present paper. 
However, she separates autonomous ego func- 
tions from “coping devices,” a distinction 
which is not made in the present research. 
Since these autonomous ego functions appear 
to subsume individual differences in constitu- 
tional endowment, it is questionable whether 
this additional separation is necessary or even 
possible with adult subjects, however useful 
it might be in developmental observations of 
a preschool group such as Murphy’s. Differ- 
ent ego functions probably require different 
assumptions concerning the development and 
essential nature of each, Thus Rapaport 
(1954a) and Hartmann (1955) in directing 
their interests toward the autonomous, con- 
flict-free ego sphere seem to focus primarily 
on impulse control, and specifically upon neu- 
tralization of impulses and “good” reaction 
formations. They, then, postulate a discon- 


tinuity in development, a phenomenon of 
emergence. In the view of the present writer, 
it does seem necessary to postulate such dis- 
continuity as part of the essential nature and 
development of a certain class of ego func- 
tions, those concerned with the économics of 
impulse. The sudden shifts in impulse eco- 
nomics that are clinically observed in psy- 
chotherapy and in childhood growth and 
development, and the problem for personality 
research posed by the “affinity of opposites,” 
emphasized by Frenkel-Brunswik (1942) 
force the formulation. Continuity and steady 
increment in function would seem more likely 
to occur with ego functions that involve the 
constitutional cognitive activities (seeing, 
hearing, remembering, exploring) listed by 
Murphy as autonomous ego functions. 

Psychoanalysis has shown some tendency 
toward the facile, and probably insufficient, 
solution of subsuming cognitive ego functions 
under constitutional genetic factors (Hart- 
mann, 1955; Rapaport, 1954a). The con- 
siderable and growing body of psychological 
research concerned with the relationships be- 
tween cognition and, personality seems to 
make this assignation less tenable as cogni- 
tive functioning appears more malleable than 
inborn causation would presuppose. The 
structural and economical view of ego opera- 
tion described in this paper does not depend 
on this assumption or its contradiction; the 
research plan is based on the expectancy that 
there will be changes in cognitive function- 
ing, described as changes in IQ, and that the 
direction of these changes will be associated 
with the nature of the ego functioning of 
different individuals. Moreover, the questions 
of constitutionality, and of continuity or dis- 
continuity bear on the present formulation 
of the ego structure only as they clarify the 
necessity that separate provisions be made for 
different kinds of ego functions. 

In line with the general conceptual dis- 
tinction made above between most kinds of 
impulse and cognition, the present model of 
ego functioning makes provisions beyond that 
of coping and defense for impulse control and 
cognition control. These categories are not 
necessarily mutually exclusive: any particu- 
lar act of behavior may simultaneously be 
defensive and coping, and may involve both 


4 Norma Haan 


cognitive and impulse control. The classical 
defense mechanisms that seem primarily con- 
cerned with impulse economics are dis- 
placement, reaction formation, and repres- 
sion, When the processes that are involved 
with each of these defense mechanisms are 
examined, the coping counterpart of displace- 
ment appears to be sublimation; the counter- 
part of reaction formation becomes substitu- 
tion; and the counterpart of repression, 
suppression. 

The defense mechanisms that seem pri- 
marily to involve cognitive activity, each 
paired with its coping counterpart, are: iso- 
lation-objectivity, intellectualizing-intellectu- 
ality, and rationalization-logical analysis. The 
model also includes four other pairs of de- 
fense and coping mechanisms which seem to 
involve mixtures of cognitive and impulse 
control or to be used first in one way and 
then in another, These are doubt and toler- 
ance of ambiguity, denial and concentration, 
projection and empathy, and regression and 
regression in the service of the ego (playful- 
ness). 

Because each of these pairs of ego mecha- 
nisms involves homologous processes with, 
however, different properties, they may be 
combined on an a priori basis to arrive at a 
more abstract formulation of an ego mecha- 
nism that is neither defensive nor coping but 
reflects merely the ego process without regard 
for its use in a coping or a defensive direc- 
tion. The list that follows attempts to specify 
the processes that characterize the various ego 
mechanisms. The general ego mechanism is 
listed first, followed by a subdivision which 
indicates the defense mechanism and its cop- 
ing counterpart; both members of the pair are 
formally similar methods for handling similar 
phenomena, and they are not to be construed 
as being inversely related to each other. In- 
deed, if the constitutional-developmental hy- 
pothesis to explain individual differences is 
correct, some individuals would show prefer- 
ences for coping or defending within the same 
general ego process, e.g., a cognitively ori- 
ented individual may defend by isolation but 
cope by objectivity. This would produce a 
positive relationship between a defense 
mechanism and its coping counterpart. 


II. 


HI. 


Processes of ego mechanisms 


. Discrimination (cognitive): the sub- 


ject has to separate idea from feeling, 
idea from idea, and feeling from feeling, 


A. 


Isolation: the subject keeps apart 
ideas that belong together emo- 
tionally, or keeps ideas and their 
corresponding affects separated (de- 
fensive). 


. Objectivity: the subject can sepa- 


rate his ideas from his feelings, so 
that he achieves an objective 
evaluation or judgment where a 
situation requires this. He is able 
to separate his feelings from one 
another when he is of two minds 


(coping). 


Detachment (cognitive): the subject 
lets his mind “roam freely,” speculates, 
and analyzes without stimulus restric- 
tion. 


A. 


Intellectualization (a subcategory 
of isolation): the subject retreats 
from impulse and affect to a pre- 
occupation with words and ab- 
stractions (defensive). 
Intellectuality: the subject is capa- 
ble of detachment in an affect 
laden situation which requires im- 
partial analysis and awareness, or 
is otherwise detached from restric- 
tions of the environment, experi- 
ence, or self so as to allow thoughts 
free rein (coping). 


Means-end symbolization (cognitive): 
the subject analyzes the causal texture 
of experience and anticipates out- 
comes. 

A. Rationalization: the subject offers 


. Logical analysis: 


an apparently plausible causal con- 
text to explain behavior and/or 
intention, which allows impulse 
gratification sub rosa, but omits 
crucial aspects of the situation, or 
is otherwise inexact (defensive). 
the subject is 
interested in analyzing thought- 
fully, carefully, and cogently the 
causal aspects of situations, per- 
sonal or otherwise. He proceeds 
systematically in his exposition, or 


Eco FUNCTIONING IN RELATIONSHIP TO IQ CHANGE 5 


if he backtracks, he is able to re- 

organize his explanations (coping). 

IV. Delayed response: the subject holds 

up decisions and time-binds tension 

due to situational complexity, lack of 
clarity, or personal noncommitment. 

A. Doubt and indecision: the subject 
is unable to resolve ambiguity. He 
doubts the validity of his own per- 
ceptions or judgments and is un- 
able to commit himself to a course 
of action. He hopes that problems 
will solve themselves or that some- 
one will solve them for him. The 
subject states situations or feelings, 
then qualifies them into meaning- 
lessness (defensive). 

B. Tolerance of ambiguity: the sub- 
ject withstands cognitive and af- 
fective complexity or dissonance. 
He is capable of qualified judg- 
ments, is able to think in terms of 
gray rather than black and white. 
He does not need to commit him- 
self to clear-cut choices in compli- 
cated situations where choice is 
impossible (coping). 

V. Selective awareness: the subject is 
able to focus his attention. 

A. Denial: the subject denies facts 
and feelings, present or historic, 
that would be painful to acknowl- 
edge. He may have a Pollyanna or 
oblivious attitude (defensive). 

B. Concentration: the subject is able 
to set aside recognizable dis- 
turbing or attractive feelings or 
thoughts in order to concentrate 
on the task at hand (coping). 

VI. Sensitivity: the subject apprehends 
and is aware of the unexpressed feel- 
ings or ideas of others. 

A. Projection: the subject attributes 
objectionable internal tendencies to 
another person or persons in the 
environment and does not acknowl- 
edge or recognize the source as 
himself. The objectionable tend- 
encies that are projected may be 
either id impulses and any of their 
derivatives, or superego attitudes 


and any of their derivatives (de- 
fensive). 

B. Empathy: the subject sensitively 
puts himself in the other fellow’s 
boots; he is able to imagine how 
the other fellow feels, and experi- 
ences this en petito himself, His 
relationships take account of the 
feelings of others (coping). 


VII. Time reversal: the subject replays or 


VIII. 


recaptures experiences, feelings, atti- 

tudes, and ideas of the past. 

A. Regression: the subject resorts to 
evasive, wistful, ingratiating, non- 
age-appropriate behavior to avoid 
responsibility, aggression, and gen- 
erally unpleasant demands from 
others and self, and to allow con- 
comitant indulgence (defensive). 

B. Regression in service of the ego 
(playfulness): the subject utilizes 
past feelings and ideas that are not 
directly ordered or required by the 
practical, immediate elements of 
the situation in an imaginative way 
in order to enrich his solution of 
problems, his handling of situa- 
tions, and his enjoyment of life. 
His regression is situationally 
adaptive and responsive (coping). 

Impulse diversion (impulse econom- 

ics): the subject modifies or changes 

the object of an impulse. 

A. Displacement: the subject tempo- 
rarily and unsuccessfully diverts 
unacceptable impulses, or affects, 
from their original objects or situa- 
tions, and then allows expression 
in a situation of greater internal or 
external permission. This may oc- 
cur as a temporal displacement 
(eg., carrying frustrations home 
from the office), or as an object 
displacement (e.g., resentment to- 
ward parents or authorities is ex- 
pressed in hostility toward the 
weak or defenseless—children or 
members of minority groups) (de- 
fensive). 

B. Sublimation: the subject finds alter- 
native channels and means, which 
are socially accepted, tempered, 


Norma Haan 


and satisfying, for the open ex- 
pression of basic impulses. Impulse 
expression is thus evident and ob- 
servable, but its temporal and 
object expression is personally re- 
warding to the subject rather than 
being productive of widening diffi- 
culties (coping). 

IX. Impulse transformation (impulse eco- 
nomics): the subject reverses the 
meaning of impulses, so that they are 
expressed as their opposites. 

A. Reaction formation: the subject’s 
intentional impulse expression does 
not admit the possibility or partial 
presence of socially unacceptable 
tendencies (hostility, dependency, 
opposite sex identifications, greed, 
dirtiness, etc.). As a result the 
subject is constantly vigilant and 
“protests too much” that his be- 
havior and thoughts are exemplary 
in terms of society’s sanctions. 
The monolithic and overdetermined 
aspects of this defense can pro- 
duce a defensive instability in 
which the rejected impulses are 
sporadically and erratically ex- 
pressed (defensive). 

B. Substitution: the subject’s behavior 
does not show socially unac- 
ceptable impulses (hostility, de- 
pendency, opposite sex identifica- 
tions, greed, dirtiness, etc.). He 
seems to have successfully neutral- 
ized (Hartmann, 1955) impulses 
or achieved a “functional auton- 
omy” (Allport, 1937), so that very 
little of his energy needs to be 
devoted to maintaining socialized 
behavior (coping). 

X. Impulse restraint (impulse econom- 
ics): the subject controls impulses by 
inhibiting their expression. 

A. Repression: the subject attempts 
to control his impulses. As he does 
not make a distinction between 
thought and action, the ideational 
representatives of his impulses are 
as threatening as the behavior it- 
self; consequently he excludes 
ideational representation from his 


thoughts, and success in excluding 
it from behavior varies among the 
subjects and temporarily varies 
within the subjects. Extensive re- 
pression can be manifested as an 
ideational constriction and a naive 
forgetfulness (defensive). 

B. Suppression: the subject holds in- 
feasible and inappropriate impulses 
in abeyance and restrains such ex- 
pression until an appropriate time 
or plage presents itself. He makes 
a distinction between thinking and 
acting, so that the ideational rep- 
resentative of an impulse is not, in 
itself, threatening and, therefore, 
put out of consciousness (coping). 


MeEtTHopD 


Subjects 


The subjects for this study are an adult sample 
of the longitudinal OGS. Specifically, the subjects 
are the 49 men and 50 women who participated both 
in the interviewing program of the Ford Follow-up 
Study during 1957-60 and in the earlier study when 
they were between 12 and 18 years of age (Jones, 
Macfarlane, & Eichorn, 1959). Demographic data 
pertaining to the status of these interviewed subjects 
are presented in Table 1. It has been:shown (Haan, 
1962) that the subjects who dropped out in the ear- 
lier years of the study (1932-38) did not differ from 
the interviewed adult subjects on the following early 
adolescent variables: (a) general intelligence as meas- 
ured by the Terman Group Test of Mental Ability 
(TGT), (b) socioeconomic status, (a) childhood 
family size, and (d) some selected adolescent per- 
sonality variables. 

The personality variables were those measured by 
the scales of the University of California Emotional 
and Social Adjustment Inventory. They included: 
family, social, and school adjustment; personal in- 
feriority; physical symptoms; fears; generalized ten- 
sions, etc. This inventory has been described in de- 
tail elsewhere (Stewart, 1962). 


Problem and Measures of IQ Change 


The study of IQ change in relation to the ego 
mechanisms was chosen for investigation because it 
appeared to be particularly relevant to the assumed 
essential differences between coping and defense, i.e., 
coping makes for free, informed, differentiated func- 
tioning, and defensiveness makes for rigid, compelled, 
distorted functioning. Thus we expected that coping 
functions generally would tend to be associated with 
IQ acceleration (positive IQ change) and that de- 
fensive functioning would generally be associated with 
IQ deceleration (negative IQ change). Several rather 


Eco FUNCTIONING IN RELATIONSHIP TO IQ CHANGE 7 


TABLE 1 


STATUS OF INTERVIEWED ADULT SUBJECTS 


Men Women 
Status (N=49) (N=50) 


1934 1958 1934 1958 


Socioeconomic* 

Professionals, semiprofes- 
sionals PPG bs 13%). 9 315) Or 

Promoters, managers, officials 5 13 13 19 

Clerks and kindred workers, # 
small proprietors 16 1 

Skilled workers and foremen f6 

Semiskilled workers 3 

Unskilled workers 3 


Education 
Non-high-school graduate 0 
High school graduate 9 
.5-2 years college 8 
2+ to 3+ years college 10 
College graduate 8 
Graduate work 14 


Marital 
Married 44 
Separated 2 
Widowed 0 
Never married 3 


Age 
Range 36-39 37-38 
Mode 37 37 


a Edwards (1933) scale; rated on the socioeconomic status 
H hejas parents in 1934 and on the subject himself in 
ò Rated on husband’s occupation unless single. 


distinct modes of IQ change were anticipated for 
different kinds of individuals. For example, accelera- 
tion might be associated with expressive modes of 
ego functioning in one person and controlling and 
regulating ego functions in another. Sex differences 
themselves were of interest inasmuch as an accelerat- 
ing man would be fulfilling a cultural expectancy 
whereas an accelerating woman might be in general 
opposition to cultural expectancies. The interesting 
work on IQ change from the Fels Institute (Kagan, 
Sontag, Nelson, & Baker, 1958; Sontag, Baker, & 
Nelson, 1958) showed a relationship between rated 
femininity in girls and IQ deceleration, and a sex 
difference in IQ change in favor of the boys. Bay- 
ley (1949) did not find a sex difference in IQ change. 
The present study differs somewhat from both Bay- 
ley’s and Sontag’s with respect to the time periods 
for which data are available. (Their subjects were 
younger than the OGS subjects at both the begin- 
hing and the end of the period under study.) Further- 
more, the present study employs ego mechanism 
variables which differ from those employed by these 
previous investigators. This study is not, then, a 
replication of previous work but it was hoped that 


it would bring further clarification and elaboration 
to the relationships between IQ change and person- 
ality and sex differences, 

The measure of IQ was developed from two dif- 
ferent administrations of TGT. The subjects had 
taken TGT in 1933, when they were about 12 years 
old, and this same test was administered again when 
they were in their mid- and late-thirties, The inter- 
viewed subjects’ total scores and their scores on two 
subtests, selected because of an assumed difference in 
the type of mental function measured, were rank 
ordered. The two subtests were a verbal one (sen- 
tence meaning) and a numerical one (arithmetic). 
Ranks were assigned in the following manner: (a) 
separately for childhood and adulthood, (b) only to 
adult interviewed subjects who had taken both tests 
(41 men and 44 women), (c) to the sexes combined. 
An IQ change measure was then obtained for each 
subject by subtracting the 1933 rank from the 1955 
rank and adding a constant of 100. 

Although there are compelling reasons for using the 
rank ordering procedure, it also has a number of 
inherent difficulties. The fact and extent of accelera- 
tion or deceleration is completely dependent upon the 
individual’s relative status within the group. A num- 
ber of subjects, about 10, who had top ranks in the 
group in 1933 and who maintained their high position 
in the group in 1955 were penalized by the down- 
wardly skewed nature of the adult distribution. At 
the upper end of this distribution a very small differ- 
ence in raw score produced a lower rank and caused 
some of these subjects to be identified as decelerators, 
The skewed adult distribution occurs because this 
particular test has too low a ceiling for adequately 
measuring the range of ability within this adult 
sample. A more difficult intelligence test and an 
absolute rather than a relative measure of change 
would in all probability have more accurately identi- 
fied the accelerators. The method of extreme groups 
was adopted to overcome this difficulty and to allow 
clearer comparison of acceleration and deceleration, in 
spite of its disadvantage of reducing the number of 
subjects available for comparison. The extreme 
groups which are used in some of the analyses rep- 
resent as nearly as possible the upper and the lower 
25% of each sex. 


Measures of Ego Mechanisms 


Subjects were interviewed in accordance with a 
schedule that covered their adolescent memories of © 
self; social-family interaction; and present status of 
self, occupation, family, etc. The number of inter- 
viewing hours varied according to the subjects’ avail- 
able time, talkativeness, and motivation., The total 
number of interviewing hours for each subject ranged 
from 2 to 36 hours with a mean of 12.4 hours. These 
interviews were summarized by the interviewer, but 
some verbatim material was included. They were 
then rated by the interviewer on the ego mechanisms: 
10 defense and 10 coping mechanisms, and general 
level of drive. Separate independent ratings were 
made from the typescript of the interview by another 
judge who had not seen the subject. A 5-point rating 


8 Norma Haan 


scale was used, but the actual score for an ego 
mechanism was the summation of the ratings given 
the subject by the interviewer and by the judge. 

In addition to the rater and interviewer-rater judg- 
ments on the 10 defense mechanisms, the 10 coping 
mechanisms, and on general level of drive, various 
other theoretically conceived measures were con- 
structed on an a priori basis. These additional rat- 
ings and the basis for their construction follows: 


1. General ego mechanisms: a summation of a 
coping rating and a defense rating, e.g., empathy + 
projection = sensitivity 

2. Total coping: summation of all coping ratings 

3. Total defense: summation of all defense ratings 

4. Total ego: summation of all coping and defense 
ratings 

5. Cognitive mechanisms: summation of defense 
and coping ratings within the cognitive sector of the 
ego 

6. Impulse economics: summation of defense and 
coping ratings within the impulse economics sector 
of the ego 

7. Coping impulse economics: summation of cop- 
ing ratings within the impulse economics sector of the 
ego 

8. Defense impulse economics: summation of de- 
fense ratings within the impulse economics sector of 
the ego 

9. Ego dispersion: the ratio of ranked total coping 
over ranked total defense 


Four other measures became available for the analysis 
as a result of factor analyses of the ego mechanisms, 

Two principal axis factor analyses were done of 
the 20 basic ego ratings, the 1938 Stanford-Binet IQ, 
and the rating of level of drive for each sex. (None 
of the derived measures was included in the factor 
analysis because this would have involved part-whole 
correlations within the correlation matrix.) Five 
factors appeared to exhaust the communal variance, 
and these were submitted to varimax rotafion. A 
reasonably parallel factorial structure for the first 
four factors for each sex resulted, and the factor 
loadings have suggested the following names: (a) 
controlled coping; (6) expressive coping; (c) struc- 
tured defense; (d) primitive, anticognitive defense, 

The fifth factor clearly does not have a comparable 
structure for men and women; for the sake of sim- 
plicity the factor scores to be used for individual 
subjects will include those for the first four factors 
only. The loadings for all factors are, however, shown 
in Table 2. 


Additional Variables 


Although the various intelligence measures and the 
ego mechanisms are the main focus of this paper, 
several other kinds of variables will be used in the 
_ two last analyses. These are: (a) ratings of the 
* subjects’ sex role behavior which were made when 
they were 16 years of age, (b) the subjects’ score on 
the Femininity scale of the California Psychological 


TABLE 2 
Eco Factors AND MecHanisM Loapincs For EACH 
Sex > .400 
Load- Load- 
Men ing Women ing 
Controlled coping 
Substitution 817 Intellectuality 792 
Concentration 811 Suppression 766 
Regression —.787 Concentration -762 
Suppression -769 Sublimation .756 
Logical analysis .646 Logical analysis .731 
Objectivity 617 Regression —.715 
Sublimation 613 Objectivity 675 
Drive 543 
Substitution 542 
Tolerance of am- 
biguity 513 


Intellectualizing 446 
Expressive coping 


Tolerance of am- Regression in sery- 


biguity 807 ice of ego 706 
Empathy 734 Empathy .609 
Regression in Displacement 560 

service of ego 644 Tolerance of am- 
Denial — 496 biguity 537 
Objectivity .490 Substitution 479 


Logical analysis A81 
Intellectuality 465 


Structured defense 


Displacement 833 Projection 675 
Rationalization 715 Intellectualizing 569 
Projection 657 Rationalization 566 
Isolation 522 Isolation 558 
Reaction forma- Displacement 540 
tion 489 Reaction forma- 
tion 530 
Primitive, anticognitive defense 
Repression -701 Repression 674 
Drive —.647 Denial -650 
Doubt and inde- Empathy — 423 
cision 567 Logical analysis — .422 
Intellectuality —.496 
Logical analysis —.485 
Denial 408 
Isolation 401 


Factor that was nonsymmetrical for men and women 

Intellectualizing .756 Doubt and inde- 

Stanford-Binet IQ .563 cision — 834 
Drive 624 


Inventory (CPI) which was administered at the 
time of the adult follow up, and (c) the subjects’ 
scores on an Ego Control scale of the same CPI test. 
The various attributes of these measures and the 
methodology employed will be described in detail 
within the context of these later sections, 


Eco FUNCTIONING IN RELATIONSHIP To IQ CHANGE 


RESULTS 


Before the results of the main comparisons 
of IQ change with the ego mechanisms are 
reported, some attention must be paid to a 
number of comparisons which take a logically 
preliminary position. The first of these is the 
reliability of the ego mechanisms and the IQ 
change measures. Secondly, the ego mecha- 
nisms’ relationships to the absolute level of 
intelligence will be presented. The nature of 
these intelligence-ego mechanism findings will 
make it necessary to report additional prop- 
erties of the IQ change measures to insure 
that the ego mechanisms’ relations to the 
absolute level of IQ and to IQ change are 
not redundant results. Because attention will 
be directed later to the differences between 
the sexes in the manner and mode of IQ 
change, another set of preliminary compari- 
sons will show the sex differences for the IQ 
change measures and the ego mechanisms 
themselves. Following these initial necessi- 
ties, five main analyses will be presented to 
show the relationships between ego function- 
ing and IQ change. They are: (a) ego 
mechanism differences between IQ accelera- 
tors and decelerators, within sex; (b) sex 
differences within IQ acceleration or decelera- 


9 


tion ; (c) differences between IQ accelerators 
and decelerators, within sex and an independ- 
ent measure of ego control; (d) relationships 
of IQ change to sex typing; (e) IQ change, 
social status, and years of education. 


Preliminary Comparisons 


Reliability. Agreement between the inter- 
viewer and the independent rater was com- 
puted for 10 defense and 10 coping mecha- 
nisms, 10 ego summary measures, total 
defense, total coping, and total ego. These 
reliabilities are shown in Table 3. The follow- 
ing measures were considered unreliable: (a) 
men: reaction formation; (b) women: isola- 
tion, discrimination, tolerance of ambiguity, 
and substitution, 

Because the unreliable measures were dif- 
ferent for men and for women, it was decided 
to retain all of them in some of the analyses 
to preserve theoretical symmetry between the 
sexes. Therefore the two factor analyses have 
used these ratings, and some of the theoreti- 
cally conceived measures, such as total de- 
fense, have included an unreliable measure 
for one or the other sex. This does not bias 
the results as the effect of the unreliable 
measure can be considered to be adventitious, 


TABLE 3 
RELIABILITY CORRELATIONS OF Eco MECHANISM Ratincs * 
Defi Coping Ego 
BaS èl P Men Women mechanism Men Women 
mechanism MenWomen mechanism Cea) 
Isolation K EST Objectivity .13 AT Discrimination 67.08 
Intellectualizing 78 AT Intellectuality 80 83 Detachment 82 80 
Rationalization 70 38 Logical analysis 83 54 Means-ends sym- 
bolization 76 56 

Doubt 357 51 Tolerance of am- 

biguity 58 —.11 Delayed response 64 42 
Denial 16 SE Concentration 13 54 Selective awareness s9 (45 
Projection 76 55 Empathy 57 39 Sensitivity 42 37 
Regression 79 75 Regression service 

of ego 72 66 Time reversal % 82 73 
Displacement 56 64 Sublimation 15: 42 Impulse diversion .64 3i 
Reaction formation .20 58 Substitution 51 .00 Impulse transfor- 

mation 32 34 
Repression AS 4S Suppression 59 36 Impulse restraint 60 34 
i i 5 X coping rate 68 42 X general mechanism 
X defense rating 66 0 oping Boe fa bs 
Total defense 64 64 Total coping 82 «62 Total ego 70 43 
female N = 50. 


a Reliabilities were corrected for attenuatio1 


n by the Spearman-Brown formula, Male N= 49; 


10 Norma Haan 


The reliability of the IQ change measures 
could not be confidently estimated because of 
the unavailability of certain required data. 
The IQ change measure is essentially a differ- 
ence score and, in order to estimate the 
reliability of this difference score, the separate 
reliabilities of the early and late measures 
are both required, together with the knowl- 
edge of the correlation between the two 
(Gulliksen, 1950, p. 353). Although a rea- 
sonable reliability estimate for TGT admin- 
istered during adolescence could be developed, 
no such possibility existed for estimating the 
reliability of TGT administered to the sub- 
jects as adults. If the reliability of the adult 
administered TGT is taken to be the same 
as the reliability of the adolescent adminis- 
tered TGT, then, for the sexes combined, the 
reliability of the change in total TGT scores 
is .71, and the reliabilities of the arithmetic 
and verbal subtests change scores are .38 and 
.28, respectively. These latter two estimates 
are lower than would have been desirable but 
are adequate for extreme group comparisons. 
As Sontag, Baker, and Nelson (1958) have 
previously noted (p. 23) in a similar context, 
the usefulness of IQ change scores is to be 
sought in the coherence and consistency of 
the relationships surrounding the measure 
rather than through reliability extrapolations 
based upon insecurely held assumptions. It 


was with this rationale that the subsequent 
analyses were undertaken. 

Relation of Ego Mechanisms to Intelli- 
gence. The scores for the various mechanisms 
were correlated with the subjects’ 1938 
Stanford-Binet IQs. This particular intelli- 
gence measure was chosen from the various 
ones available because it was considered the 
most comprehensive single measure of the 
subjects’ ability. Defense mechanisms, by 
their defined properties, should have gener- 
ally negative correlations with IQ and coping 
mechanisms, by their defined properties, 
should have positive correlations with IQ. 
Table 4 shows that this hypothesis has been 
generally borne out. Intelligence correlates 
positively with all coping mechanisms and in 
some instances significantly. Intelligence 
generally correlates negatively with defenses, 
but the only correlation of significance is that 
between displacement and IQ. The defense 
mechanism of intellectualizing is an exception 
to this general trend. Correlations involving 
summary ego measures obviously represent 
something in between these two general 
trends. 

Intelligence, as a cognitive measure itself, 
was expected to show a higher relationship to 
the cognitive aspects of ego functioning than 
to the ego functions concerned with ‘impulse 
control. The pattern and levels of these inter- 


TABLE 4 
CORRELATIONS BETWEEN STANFORD-BINET IQ anp Eco MECHANISMS 
Defense Copin; Ego 
mechanism te kets AEA Een ES Men Women 
Isolation —.23 — Objectivity 47** 24 Discrimination .18 —" 
Intellectualizing 24 .27* Intellectuality 67** = 51** Detachment 58** 49*+ 
Rationalization —.15 —.14 Logical analysis A0** 45** Means-ends sym- 
Doubt —.22 —.22 Tolerance of am- bolization 21 10 
is biguity 39** —* Delayed response 11 31* 
Denial —.25 —.22 Concentration 18.05 Selective aware- 
ney ness —.08 —.13 
Projection —.23 —.18 Empathy BEI .22 Sensitivity —.11 .03 
Regression —15 —.04 Regression in service 
of ego 44** 17 Time reversal 17 08 
Displacement —.290* 06 Sublimation Bee 12 Impulse diversion .09 13 
Reaction forma- Substitution 38** —" Impulse transfor- 
tion —* 0% mation 18 09 
Repression —.20 —.08 Suppression 26.22 Impulse restraint .06 .09 
Total defense —.31* —.03 Total coping 54** 35% Total ego 31 31 


a Unreliable. 
* p = 05 (two-tailed test). 
** p Æ .01 (two-tailed test). 


a. a 


Eco FUNCTIONING IN RELATIONSHIP To IQ CHANGE 11 


correlations for the sexes, treated separately, 
were almost identical; therefore, results re- 
ported below are for the men’s and women’s 
measures combined. Cognitive mechanisms 
correlated .53 with IQ; impulse economics 
correlated .17 with IQ; however, cognitive 
mechanisms and impulse economics correlated 
.51 with each other. Thus, the IQ correla- 
tions support the hypothesis. Moreover the 
relationship between impulse economics and 
cognitive mechanisms shows that the variance 
within cognitive mechanisms is as much due 
to impulse economics as it is to intelligence. 
This latter finding suggests that cognitive 
mechanisms are not merely a rated version 
of formal intelligence, but that they may also 
include other cognitive behaviors which are 
not measured by an intelligence test such as 
the Stanford-Binet, 

Properties of the IQ Change Measures. 
Most previous studies of IQ constancy or 
variability have been able to report the 
amount of change in an absolute sense, such 
as change in IQ points, for the sample under 
study. The extent of change for the present 
subjects cannot be reported in this way be- 
cause of the rank ordering procedure used. 
Moreover, the difference scores themselves are 
a measure of shift within the group. The mean 
difference score for the total test was 13.65; 
for the arithmetic subtest, 20.84; and for the 
verbal subtest, 24.32. The correlations be- 
tween the 1933 and the 1955 scores express 
this finding in another way: the total score 
correlation is .73; arithmetic subtest, .43; and 
verbal subtest, .25. 

The previous section showed positive rela- 
tionships between coping mechanisms and in- 
telligence and generally negative relationships 
between defense mechanisms and intelligence. 
Now, if absolute intelligence and 1Q change 
were found to have a strong association with 
each other, a confirmation of the hypothesized 
relationship between IQ acceleration and cop- 
ing would be merely a repetition of the afore- 
mentioned findings. In order to test this 
possibility, the relationship between the 1938 
Stanford-Binet, taken at the end of adoles- 
cence, and the IQ change scores was ascer- 
tained. If IQ change were simply a further 
expression of absolute IQ level, then the cor- 
relation between IQ change and the 1938 


Stanford-Binet should be substantial and 
positive. When computed, these correlations 
for the total score, and the arithmetic and 
verbal subtests, respectively, prove to be of 
zero order: —.04, .07, and .03. At the same 
time the various TGT absolute scores cor- 
related positively and fairly substantially with 
the Stanford-Binet. The 1933 TGT total 
score, and arithmetic and verbal subtests 
correlate .73, .40, and .25, respectively, with 
the Stanford-Binet, and the comparable 1955 
TGT correlations are .77, .48, and .53,4 
These two sets of TGT relationships with the 
Stanford-Binet—the zero order correlations 
with the difference scores and the substantial 
correlations with the absolute scores—allow 
the assumption that IQ change scores rep- 
resent an unbiased selection of various ability 
levels and do measure change in intelligence 
over time. 

Sex Differences for the Ego Mechanisms 
and IQ Change. Table 5 shows that the male 
subjects were rated significantly higher than 
the female on a number of coping mechanisms 
and general ego mechanisms, but that the 
only defense mechanism which shows a sig- 
nificant sex difference is intellectualizing. The 
significant differences between the sexes for 
coping measures are heavily weighted in the 
area of cognitive functioning; impulse trans- 
formation is the only mechanism from the 
impulse economics section of the ego that 
differs for the sexes. It will also be noted 
that this sample of men is generally more in- 
telligent and has a higher level of drive than 
the sample of women. The women’s means 
are higher (but the differences do not reach 
significance) on the following variables: 
doubt, empathy, sensitivity, displacement, re- 


4 It is interesting to note that the correlation of the 
adult verbal subtest with the late adolescent (1938) 
Stanford-Binet is relatively high (.526) but the cor- 
relation of the same early adolescent (1933) verbal 
subtest with the Stanford-Binet is low (.252). These 
figures were computed for men and women in a 
combined group. The difference in these two relation- 
ships is particularly striking when it is noted that 
the adolescent-adult correlation spans 20 years in 
time, whereas the early-late adolescent correlation 
spans only 5 years. This pattern of relationships may 
suggest that substantial shifts in the subjects’ verbal 
ability had already occurred by the end of the formal 
adolescent period. 


12 Norma Haan 


TABLE 5 


SIGNIFICANT DIFFERENCES BETWEEN THE SEXES FOR 
Eco MECHANISM RATINGS, 1938 STANFORD-BINET 
IQ, AND GENERAL LEVEL OF DRIVE 


Men Women 
Ego mechanism t ratio" (N = 49) (N = 50) 


x SD x SD 


Objectivity 1,99* 5.14 1.97 4.46 140 
Intellectualizing 4.12** 4.74 2.18 3.24 135 
Intellectuality 2.89** 4.82 2.23 3.64 1.80 
Detachment 4,39** 11,55 3.44 8.84 2.66 
Means-end sym- 

bolization 3.00** 12.76 234 1118 2.85 
Logical analysis 2.39* 5.24 1.96 434 1.80 
Concentration 3.36** 5.67 1.78 4.50 1.69 
Selective aware- 

ness 3.28** 13.94 2.42 12.28 2.60 
Impulse trans- 

formation 2.61¥ 12.98 2.24 11.82 2.17 
Total coping 1.76 58.41 14.20 53.88 11.21 
Total ego 3.12** 120.16 13.26 112.46 11.25 
Binet IQ 2.55* 118.96 11.64 112.98 11.65 
Level of drive 2.13* 6.92 1.47 6.24 1.68 
Cognitive mech- 

anisms 4.07** 29.26 6.28 24,58 5.11 


* Men’s M is higher in all cases (two-tailed test). 
*'p = 05, 


"p= 01. 


pression, impulse restraint, defense impulse 
economics, controlled coping, expressive cop- 
ing, and primitive defense. 

Comparisons between men and women for 
the various IQ change scores were made with 
the following results: (a) total ranked differ- 
ence score: #= 2.89, p< .002, df= 83; 
(b) arithmetic ranked difference score: ¢ = 
3.97, p < .001, df = 83; (c) sentence mean- 
ing ranked difference score: ¢ = 1.07, p= 
.14, df = 83. In all cases the men’s means 
are in the direction of greater acceleration. 
These results are in agreement with those 
previously reported by Sontag and his asso- 
ciates (1953). 


Main Comparisons 


Ego Mechanism Differences between IQ 
Accelerators and Decelerators within Sex. 
Table 6° shows the significant mean differ- 


5 Expanded versions of Tables 6, 7, 8, and 9 have 
been deposited with the American Documentation 
Institute. Order Document No. 7644 from ADI 
Auxiliary Publications Project, Photoduplication Serv- 


TABLE 6 


Eco MECHANISM DIFFERENCES BETWEEN ACCELERA- 
TORS AND DECELERATORS WITHIN SEX 


(Extreme groups) 


Men p= Women p= 


Coping mechanism 
Objectivity .05" SM Objectivity .05* Ar 


Logical analy- 

sis 05" SM Sublimation .01* Ar 

Tolerance of 

ambiguity .05" SM 

Totalcoping .04%°SM Totalcoping .02"" Ar 

Expressive 

coping .05* SM _ Coping impulse 
economics 01^ Ar 
Controlled 
coping 05° Ar 


Empathy Oi GTS 


Additional measures 


Cognitive Cognitive 
mechanisms .05* SM mechanisms .05“ Ar 
1934 social 
status® 05 SM 
1934 social 
status * 10" Ar 


Defense mechanism 


Defense impulse Doubt 01" SM 
economics .05 Ar Projection oS LS 
Structured 
defense 05* SM 
Structured 
defense 05" TS 


Ego mechanism 


Detachment .05* SM _ Discrimina- 
Means-ends tion 05* Ar 
symboliza- 
tion 05* SM 
Delayed re- 
sponse 05" SM Delayed re- 
Sensitivity 0S* SM sponse 05" SM 
Impulse 
diversion 05 Ar 
Total ego .05* SM 


Note.—SM is Sentence Meaning subtest; Ar is Arithmetic 
subtest; TS is Total Score. Probability Jevels reported are 
for two-tailed tests except in the case of a hypothesis test. 

_Extreme groups included approximately the uj per and lower 
25% of the total sample; the Ns varied slighi ly according to 
the subtest analyzed. 

à Accelerators’ M is higher. 

» Hypothesis test (one-tailed taal 

© Social class differences will be discussed in a later section. 


ice, Library of Congress, Washington, D. C. 20540, 
Remit in advance $1.75 for microfilm or $2.50 for 
photocopies and make checks payable to: Chief, 
Photoduplication Science, Library of Congress. 


Se, 


Eco FUNCTIONING IN RELATIONSHIP TO IQ CHANGE 13 


ences between accelerators and decelerators 
within sex for the total score and for the two 
subtests: Arithmetic and Sentence Meaning. 
These results were evaluated by an overall 
test of significance suggested by Block 
(1960). This procedure involves the genera- 
tion of an empirical sampling distribution, in 
this instance based upon the ego mechanisms’ 
ratings and combination scores, of randomly 
constituted groups of the subjects being 
studied. The general sampling distribution 


` thus accurately reflects the extent of inter- 


correlation within the data pool and conse- 
quently sets up more appropriate require- 
ments for rejecting the null hypothesis than 
would have been the case with the normal 
curve. Within this frame of reference, com- 
parisons of difference scores on TGT subtests 
reveal a significant number of differences in 
ego functions between accelerators and de- 
celerators: (a) for both men and women on 
the Sentence Meaning subtest and (b) for 
women on the Arithmetic subtest. Differences 
do not reach significance in the case of the 
Total Score for either men or women nor on 
the Arithmetic subtest for men. Specifically, 
the Sentence Meaning subtest yielded 11 sig- 
nificant results for men and 4 for women. 
The occurrence of this number of results at 
= .05 level had an empirical probability level 
of = .01 for men and = .04 for women 
according to the empirical sampling distribu- 
tion generated for the sexes separately and 
for randomly chosen groups of a size compa- 
rable to the extreme IQ change groups. The 
empirical probability levels for the Arithmetic 
subtest results were = .10 for men and = .04 
for women, The probabilities for TGT Total 
Score results were .80 for men and =.14 for 
women, 

Because of the deficiencies of the Total 
Score in distinguishing between the two groups 
and because of its necessarily imprecise mean- 
ing, it will not be used in subsequent anal- 
yses. The arithmetic change score for men 
will, however, be reported as an accompani- 
ment to the women’s analyses. 

Table 6 shows that a number of the same 
ego mechanisms have significant relationships 
to IQ change fggeboth men and women. Con- 
sequently these particular ego mechanisms 
can be regarded as generally characteristic of 


subjects whose IQs have changed over this 
25-year period when the comparison of ac- 
celeration versus deceleration is within sex. 
The mechanisms which are more character- 
istic of accelerators than of decelerators for 
both sexes are: objectivity, total coping, 
cognitive mechanisms, and delayed response, 

However, there are differences between the 
sexes in the ego variables associated with 
acceleration. Note that expressive coping 
characterizes male accelerators; controlled 
coping characterizes female accelerators (the 
highest loadings on the controlled coping 
factor involve impulse regulation, whereas the 
expressive coping factor does not particularly 
involve impulse regulation). Impulse eco- 
nomics is evidently centrally related to posi- 
tive IQ change for women but is not involved 
in IQ change for the men except as defenses 
against impulse are associated with decelera- 
tion. However, contrary to expectation, gen- 
erally neurotic and specifically cognitive de- 
fenses are more characteristic of women 
accelerators than of decelerators. (The sig- 
nificant difference for the factor of structured 
defense reflects the high loadings on all three 
defensive cognitive mechanisms: isolation, 
intellectualization, and rationalization.) The 
differences between male accelerators and de- 
celerators are, however, without contradic- 
tion: coping mechanisms of the expressive 
variety relate to acceleration in men; con- 
versely, defense mechanisms entailing impulse 
constraint or redirection are related to de- 
celeration in men, 

Block’s (1960) overall significance test does 
not insist that an individual significant find- 
ing, occurring within a series of comparisons 
that are insignificant in toto, necessarily must 
be regarded as a chance occurrence. If other 
logically reciprocal and validating relation- 
ships can be found that further support the 
individual finding, it merits consideration. 
Examination of Table 6 will show that there 
are a number of individually significant re- 
sults among the two insignificant series of 
comparisons: (a) women’s total difference 
score: structured defense and projection as 
characteristic of accelerators and empathy as 
characteristic of decelerators; (b) men’s 
arithmetic difference score; impulse diversion 
and defense impulse economics as character- 


14 Norma Haan 


istic of deceleration. These results tend to 
support the previous generalization that there 
is a neurotic, striving element in women’s 
acceleration but that neurotic impulse control 
is associated with deceleration in men. Find- 
ings to be reported in later sections of this 
paper also tend to substantiate this view. 

If the level of significance for reported 
findings is lowered to the .10 level, a number 
of actual reversals occur between men and 
women in the direction of the relationships of 
ego mechanisms to IQ change. As these pairs 
of reversals tend to define further the differ- 
ences in modes of acceleration for the two 
sexes, it is valuable to report them. Table 7 
(see Footnote 5) shows these results. 

Again the free coping aspects of male ac- 
celeration and the defensive aspects of male 
deceleration are emphasized in contrast to 
the channeled, defensive character of some 
aspects of female acceleration. 

Sex Differences within IQ Acceleration or 
Deceleration. The foregoing findings lead 
directly to a comparison of ego functioning 
between male and female accelerators and 
decelerators. In evaluating the results of these 
comparisons between the sexes, within accel- 
eration or deceleration, general sex differences 
in ego functioning must not be confused with 
differences in ego functions that characterize 
different modes of acceleration or deceleration 
for men or women, If a mechanism has also 
been shown to be characteristic of a sex 
difference in acceleration or deceleration, 
Table 8 (see Footnote 5) reports the general 
sex difference again, However, results are not 


TABLE 7 


Eco MecHanismMs RELATED TO IQ CHANGE IN 
Opposite DIRECTIONS FoR MEN AND WOMEN 


(Extreme groups)* 


Probability level 


Mechanism 
Men Women 
Empathy 10°TS 01 TS 
Regression 10 TS .10°SM 
Impulse diversion 05 Ar 10° Ar 
Impulse economics 10 Ar .10°SM 


Note.—TS is Total Score; Ar is Arithmetic subtest; SM is 
Sentence Meaning subtest. Probability levels reported are for 
eet tests, Juded I 

a Extreme ups included approximately the er and 
lower 25% of the total sample; the Ns varied ceding to 
the subtest analyzed. 

b Accelerators’ M is higher. 


reported if significant findings occur for all 
three comparisons—between the sexes gen- 
erally, between the sexes within acceleration, 
and between the sexes within deceleration. 
In this case the result would merely be a 
general sex difference. The absence of gen- 
eral sex differences for the total group is also 
shown in those instances in which a mecha- 
nism has been shown to characterize accelera- 
tion and/or deceleration in general. Table 8 
shows a great number of sex differences within 
IQ change for the verbal subtests; however, 
the men and women who change in arithmetic 
performance are evidently more alike than 
are the men and women generally. Only 
three variables significantly characterize the 
difference between the sexes on arithmetic 
deceleration: women decelerators are higher 
on doubt, delayed response, and empathy than 
men decelerators. 

All other results to be considered here are 
those which differentiate between the sexes on 
verbal IQ change. Two variables, ego dis- 
persion and general level of drive, show no 
difference between the sexes within accelera- 
tion or within deceleration although the sex 
difference between the total group is for both 
of these variables = .05, Evidently the 
women in accelerating become more like men 
(and the men in decelerating become more 
like women) in total drive and ego dispersion. 
Sex differences within deceleration show that 
the decelerating men are higher on total de- 
fense and structured defense, whereas women 
decelerators are higher than their male coun- 
terparts on empathy. Specifically, male and 
female decelerators become more alike than 
the sexes generally are on: objectivity, in- 
tellectuality, logical analysis, means-ends 
symbolization, concentration, total coping, 
and IQ. Sex differences within acceleration 
closely follow the general sex differences with 
only two mechanisms at the = .10 level show- 
ing differences between the sexes which are 
not attributable to a general sex difference. 
They are: rationalization (men higher) and 
time reversal (women higher). Furthermore, 
there are no general sex differences which 
show that the accelerating women are more 
like the accelerating men than the sexes gen- 
erally. 

The suggestion previously advanced that 


nate aa aaa 


Eco FUNCTIONING IN RELATIONSHIP TO IQ CHANGE 15 


TABLE 8 


Eco MECHANISM DIFFERENCES BETWEEN THE SEXES 


WITHIN ACCELERATION OR DECELERATION GROUPS 


(Extreme groups)* 


Sex difference 

with IQ change 

not considered Sex differences within Sex differences within 

(Total group: acceleration deceleration 

50 women Sentence j 
Mechanism 49 men) meaning Aritimeng ae Arithmetic 

Objectivity 05 1 ns ns ns 
Intellectuality 01 05 ns ns ns 
Detachment 01 01 ns 05 ns 
Rationalization ns 10 ns ns ns 
Logical analysis .05 01 ns ns ns 
Means-end symbolization 01 01 ns ns ns 
Doubt ns ns ns ns 05” 
Delayed response ns ns ns ns 10” 
Concentration O1 01 ns ns ns 
Selective awareness 01 05 ns 10 ns 
Empathy ns ns ns 05” 10” 
Time reversal ns 10” ns ns ns 
Total defense ns ns ns 10 ns 
Total coping 10 05 ns ns ns 
Total ego 01 01 ns 10 ns 
IQ .05 10 ns ns ns 
Drive 05 ns ns ns ns 
Cognitive mechanisms 01 01 ns 05 ns 
Ego dispersion 05 ns ns ns ns 
Structured defense ns ns ns 10 ns 
1934 social status ns 10 10” 10” ns 


Note.—Probability levels reported are for two-tailed tests. 


a Extreme groups included approximately the upper and lower 25% 


to the subtest analyzed. 
b Womens’ M is higher. 


the women accelerators are characterized by 
a control in coping and by neurotic elements, 
whereas the men are more expressive and 
freer in their accelerative trends, is supported 
only when the comparison is within sex. How- 
ever, the results of this analysis do support 
the mirror image of this hypothesis in that 
the men who decelerate (probably thus vio- 
lating the cultural sex role expectancy) are 
more defensive and less empathic and lose at 
the same time a number of coping character- 
istics. In this latter sense they are more like 
the accelerating women. Further, the women 
who decelerate are considerably more healthy 
than the men who decelerate. The reader will 
recall that when the women are compared 
with themselves, decelerating women are less 
defensive than the accelerating women in a 
number of ways, even though they are not 
more coping. 

Differences between IQ Accelerators and 


of the total sample. Ws varied slightly according 


Decelerators within Sex and Ego Control. In 
an attempt to specify further the properties of 
acceleration or deceleration for each sex, 
Block’s (1961) Ego Control scale was used in 
analyzing the differences between the two 
kinds of IQ change. This scale was constructed 
by a factor analysis of the items of the Cali- 
fornia Psychological Inventory (CPI) and 
includes those items which have high loadings 
on the first factor. Block has found substan- 
tially the same array of factor loadings on a 
number of different samples. Very high Ego 
Control scores are thought to reflect a neuotic 
rigidity, and very low scores, an undercon- 
trolled impulsivity. This particular scale was 
chosen because it was independent of the 
ego mechanisms but still shared a somewhat 
similar psychological meaning. 

Groups were constituted for this analysis 
for the sexes separately and for each of the 
two subtests (Verbal and Arithmetic) as fol- 


16 Norma Haan 


lows: (a) high ego control accelerators versus 
high ego control decelerators, (b) low ego 
control accelerators versus low ego control 
decelerators. 

The sample means of the Ego Control scale 
and of the IQ difference scores were used as 
the cutting point for the various groups. Ex- 
treme groups were not used in this analysis 
because it was necessary to augment the num- 
ber of subjects available to fill the various 
groups, which now had three membership re- 
quirements (sex, ego control, and IQ change 
score). All available subjects who fulfilled 
the requirements were used in this analysis; 
the Ns for the various groups are shown on 
Table 9 (see Footnote 5). 

There is evidently very little difference be- 
tween the ego functioning of men of high or 
low ego control who accelerated or deceler- 
ated in arithmetical skill. This finding is 
consistent with those already reported for 


comparisons involving male subjects, as IQ 
change in the arithmetical ability appears to 
have relevance to the ego functioning of the 
women but not of the men. The women who 
accelerated in arithmetical skill (when ego 
control is held constant) are seen to be higher 
on almost all of the measures of impulse eco- 
nomics, and most of these are measures of 
coping mechanisms. 

Verbal IQ gain for men, within high ego 
control, is associated with the same general 
coping characteristics which occurred in the 
previous analysis, when ego control was not 
held constant. However, the cognitive area 
is now more salient when ego control is high 
and is held constant. Interestingly, the pres- 
ent analysis reveals denial as characteristic 
of decelerating men with high ego control. 
This finding is replicated in the comparable 
analysis for the women, where denial is found 
to be characteristic of female decelerators of 


TABLE 9 


Eco MECHANISM DIFFERENCES BETWEEN ACCELERATORS AND DECELERATORS WITHIN Eco CONTROL 
AND SEX GROUPS 


Men p= Women TE Men hi Women pe 
Arithmetic Sentence meaning—continued 
ae A a Aue ae ata} pee analy 05” Reaction 
= 10 vers = 9 versus eans-end sym- formation 10” 
Substitution 10 Sublimation 05” bolization 05” Impulse trans- 
Drive 05 Impulse diver- Tolerance of formation 10” 
sion 01? ambiguity 05” Denial 05 
Suppression 10 Delayed re- Selective aware- 
Impulse re- 3 sponse 10” ness -10 
7 parit 10 Sensitivity lo” Primitive 
peewee yp) oe Cognitive mecha- defense 05 
Dus $ nisms 05" 
Controlled stats aly 
E 40° Denial .05 
fee cies Lt ae AS an ego control Low ego control 
(N = 10 versus 10) (N = 6 versus 10) ase ERE ns bi ee ae TA 5? 
IQ .10 Suppression .10" Di $ 4 
10 40° elayed re- d 
Coping impulse R ORE D d 
economics .10* reon 05 F 
Serin to Time reversal 05 
ee å Total defense a0" 
ice meaning Denial .10* 

High ego control High ego control Selective aware- 

(N = 9 versus 8) (N = 14 versus 8) ness .05 
Intellectualizing .10° Doubt 10° Drive doubt 05 
Intellectuality 01” Delayed re- 1934 social à 
Detachment Ko) kd sponse .10* status 10 


à Two-tailed test. 
b Accelerators’ M is higher. 


Eco FUNCTIONING IN RELATIONSHIP TO IQ CHANGE 17 


both high and low ego control. Though we 
would expect denial to be related to decelera- 
tion by virtue of its extensive information 
denying properties, this relationship has been 
obscured in the previous analyses. 

The neurotic, defensive aspects of female 
acceleration appear rather clearly within this 
analysis. Accelerators of high ego control 
and accelerators of low ego control are both 
significantly higher on doubt and delayed re- 
sponse, but the difference between these two 
modes of positive IQ change is suggested by 
the difference between impulse transforma- 
tion (which relates to high ego control) and 
regression, time reversal, and total defense 
(which all relate to low ego control). 

Relationships of IQ Change to Sex Typing. 
Sontag, Baker, and Nelson (1958) have sug- 
gested the possibility that strong femininity 
might be characteristic of the decelerating 
female. They have reported that high femi- 
ninity characterized girls who decelerated in 
IQ between the ages of 6 and 10; whereas 
the present study considers IQ change be- 
tween the approximate ages of 12 and 37. 
Since adolescence is probably the crucial time 
for a more or less self-conscious and intensi- 
fied selectivity in regard to sex roles, the 
present research plan could not precisely 
replicate the Fels findings. The impact of 
sexual impulses and society’s declaration that 
adolescence is the time for the solution of 
heterosexual identity may cause some con- 
fused, artificial, and temporary solutions. 
Women will frequently report clinically that 
previous role learnings or accomplishments 
were consciously cast aside during adoles- 
cence. In attempting to achieve a heterosexual 
identity, a number of girls may not be able to 
differentiate sexual functions from intellectual 
functions, and this failure of differentiation 
could account for intellectual deceleration 
either temporarily or permanently. 

Three approaches to testing the hypothesis 
of a negative relationship between femininity 
and IQ acceleration seemed appropriate and 
were within the data resources of OGS: 


1. Differences in adolescent feminine be- 
havior (rating at 16 years of age in a free- 
play situation based on an average of three 
staff raters) of extreme IQ change groups 

2. The differences in adult femininity 


(Femininity scale of the CPI taken during the 
adult follow-up study) of extreme IQ change 
groups 

3. Differences in IQ change scores for sub- 
jects who have been consistent or inconsistent 
in the level of their femininity between ado- 
lescence and adulthood. (Consistency or in- 
consistency was defined by the subject’s posi- 
tion above or below the sample mean of the 
adolescent and the adult femininity measures; 
it was necessary to use all female subjects for 
this analysis, so that the various groups 
could be constituted.) 

The consistency of femininity over time 
is, however, subject to the fact that the two 
measures are not entirely consistent in their 
frames of reference, the adolescent one being 
a behavior rating and the adult one being a 
self-report. The adolescent feminine behavior 
rating for age 16 was chosen rather than a 
rating for an earlier age, because it was 
thought that most girls would be involved by 
this age in an attempt to solve their sexual 
identity; the same comparisons were made 
for the male subjects without any strong ex- 
pectancies, however, that masculinity per se 
has any particular effect on IQ change. The 
results for the adult and adolescent compari- 
sons are given in Table 10 for both men and 
women and for both the Verbal and Arith- 
metical subtests. The findings show that the 
hypothesis relating IQ deceleration to high 
femininity is supported in two instances and 
contradicted in two others: Marked feminine 
behavior during adolescence is associated with 
deceleration in verbal ability but with ac- 
celeration in numerical ability (the proba- 
bility of this latter result is 18). On the 
other hand high scores on the CPI Femininity 
scale in adulthood are associated with ac- 
celeration in verbal ability and deceleration 
in numerical ability. 

In the case of the men none of the rela- 
tionships between masculinity and IQ change 
is significant; consequently, they will not be 
studied or discussed further in this paper. 

The relation between IQ change and the 
consistency or inconsistency of adolescent 
femininity with adult femininity was investi- 
gated by dividing the entire sample of female 
subjects into four groups and comparing them 
on the dimension of change in IQ. 


18 


Norma Haan 


TABLE 10 


DIFFERENCES BETWEEN FEMALE ACCELERATORS AND DECELERATORS IN Sex TYPING 


(Extreme groups)* 


Accelerators Decelerators DRN Probability 
i = = ratio 
ate X SD x SD level 
Women’s sentence meaning 
Adolescent feminine behavior 46.31 13.93 58.73 14,82 2.138 02 £ 
Adult feminine score (CPI) 25.83 3.83 22.92 4.01 1.775 08 
Women’s arithmetic 
Adolescent feminine behavior 56.50 11.19 50.15 9.41 1411 185 r 
Adult feminine score (CPI) 22.30 3.35 24.75 3.65 1.552 06" 
Men’s sentence meaning 
Adolescent masculine behavior 47.15 15.98 52.50 15.07 «781 nss 
Adult masculine score (CPI) 15.79 3.98 15.27 2.93 387 ns 
Men’s arithmetic 
Adolescent masculine behavior 54.50 12.70 56.45 15.09 336 ns* 
Adult masculine score (CPI) 15.69 3.95 15.00 2.10 470 ns? 


a Extreme groups included approximately the upper and lower 25% of the total sample; Ns varied slightly according 


to the subtest analyzed. 
b Hypothesis test (one-tailed test). 
© Decelerators’ sex role M is higher. 


4 Two-tailed test; masculinity on CPI is shown by low scores. 


The groups were constituted as follows: 


1, Consistent high femininity in adolescence 
and adulthood—above mean on adolescent 
behavior rating and on adult CPI Femininity 
scale (consistent High Fe); N = 8 

2. Consistent low femininity in adolescence 
and adulthood—below mean on adolescent be- 
havior rating and on the adult CPI Femi- 
ninity scale (consistent Low Fe); N = 10 

3. High femininity in adolescence and low 
femininity in adulthood—above mean on ado- 
lescent behavior rating and below mean on 
adult CPI Femininity scale (High to Low 
Fe); VN=8 

4, Low femininity in adolescence and high 
femininity in adulthood—below mean on 
adolescent behavior rating and above mean on 
the adult CPI Femininity scale (Low to 
High Fe); N=7 
The results of the comparisons between these 
groups are shown in Table 11. 

The comparisons of the four groups with 
each other, within the two types of IQ change, 
yielded no results of significance for IQ change 
_ in arithmetic, but were significant in three 

cases for verbal IQ change. These were: 
(a) Consistent High Fe versus Low to High 


Fe, (b) Consistent Low Fe versus Low to High 
Fe, (c) High to Low Fe versus Low to High 
Fe. In all cases the girls who had been low 
in feminine behavior at age 16, but who 
were above the mean on the CPI Femininity 
scale as adults, accelerated as compared with 
the other groups. Note that the arithmetical 
change results reverse this process. Those 
who were high on feminine behavior at 16 
years of age and subsequently low on adult 
femininity show the most accelerative change 
in arithmetical capacity, but the most de- 
celerative change in verbal ability. However 
the generally decelerative means in arithme- 
tic and sentence meaning for females who have 
been consistently feminine over time appear 
to confirm, albeit weakly, the Sontag hypothe- 
sis. At the same time this analysis suggests 
that there is greater complexity in the rela- 
tionships between IQ change and sex role than 
was previously supposed. As it turns out, 
femininity according to the present measures 
appears to be no more constant than IQ, and 
when the two are related to each other the 
kind of IQ, verbal or arithmetical, makes 
considerable difference. In general girls who 
accelerated in verbal ability were tomboys as 
adolescents—or perhaps merely slow in de- 


Eco FUNCTIONING IN RELATIONSHIP TO IQ CHANGE 


19 


TABLE 11 


DIFFERENCES BETWEEN WOMEN OF VARIOUS PATTERNS OF SEX TYPING IN IQ CHANGE 


Arithmetic Sentence meaning 
Groups a = 
N X SD X SD 
1. Consistent high Fe 8 86.0 15.58 86.3 28.86 
2. Consistent low Fe 10 94.2 20.84 93.5 23 
3. High to low Fe 8 97.8 28.24 86.0 32.43 
4. Low to high Fe 7 93.3 23.72 121.0 19.23 
t ratio t ratio 
Consistent high Fe versus consistent low Fe .906 541 
Consistent high Fe versus high to low Fe 1.062 020 
Consistent high Fe versus low to high Fe .719 2.619** 
Consistent low Fe versus high to low Fe 264 A87 
Consistent low Fe versus low to high Fe 077 2.339% 
High to low Fe versus low to high Fe 305 2.329% 


Note.—The mean scores above show 
tion, These figures, as was stated before, 
Femininity is abbreviated Fe. 


a Two-tailed test. 
* p= 04. 
* p= 02. 


veloping femininity—but by adulthood they 
have become more feminine. Girls who ac- 
celerate in arithmetical ability reverse the 
procedure: as adolescents they are high in 
femininity, but by adulthood they have be- 
come less feminine. 

IQ Change, Social Class, and Years of Edu- 
cation. Two social class ratings were avail- 
able for the subjects: an early adolescent 
rating made on the father’s occupational 
status in 1934, and an adult rating made on 
the basis of the subject’s (or the husband’s) 
occupational status in 1957-58. The Edwards 
(1933) scale, which divides the group into 
the five following categories, was used for 
both ratings: (a) professional, (b) mana- 
gerial, (c) clerk and small proprietor, 
(d) skilled, and (e) unskilled. The two 
ratings were part of all analyses involv- 
ing the ego mechanisms. Note that these 
various analyses have used extreme groups 
and total groups; men and women; and 
Arithmetical, Verbal, and Total IQ change 
scores. Social class proved to be related to a 
group difference in only one instance: women 
decelerators on the verbal subtest came from 
a higher 1934 social class than the women who 
accelerated on the same test (p £ .05). This 
single result can probably be regarded as a 
chance occurrence. 

There is the possibility that acceleration 


IQ acceleration if they are above 100; 
represent the difference between ranks 


if they are below 100, they show IQ decelera- 
for the two time periods plus a constant of 100, 


in IQ might be simply and positively related 
to the number of years that an individual had 
been educated. But positive findings in this 
direction would not close all questions regard- 
ing the relation of personality to IQ change, 
because the individual’s participation in the 
selection or rejection of more or less educa- 
tion would still remain at issue. In any case, 
an attempt to test the relationship of 1Q 
change to education for both the entire sam- 
ple and the extreme groups did not reveal 
significant differences between accelerators 
and decelerators on the Verbal or Arithmetic 
subtests for either sex, although the results 
are in the expected direction. The difference 
in educational level between the men (extreme 
groups) who accelerated or decelerated on 
the Verbal subtest had a probability level of 
16. This result is the nearest that any come 
to being significant. 


Discussion 


This paper has two purposes: generally to 
demonstrate the usefulness of a model of ego 
functioning, and specifically to report the re- 
lationships found between the ego mechanisms 
and selected measures of IQ change. Since 
judgment in regard to the usefulness of any 
model is essentially a pragmatic matter, a 
discussion of the findings, ramifications, and 


20 Norma Haan 


meanings of the specific purpose is probably 
the best way to fulfill the general purpose. 

There are several limitations of the present 
work that need to be noted at the outset. The 
two subtests of TGT used in this work take 
6 minutes each to administer, and many of 
the results of this paper are based on these 
two 6-minute samples of behavior which are 
separated in time by 25 years. The logical 
coherence and breadth of the results militate 
against the possibility that they are fortuitous, 
but a considerably more powerful measure of 
IQ change might have been derived had in- 
dividual intelligence tests, given over succes- 
sive years, been available. The present study 
avoids the methodological difficulties of prac- 
tice effects which are bothersome to studies 
using change scores based on successive meas- 
ures over a number of years; but at the same 
time the present results have probably been 
attenuated by the less than optimal IQ change 
measures. If a test more difficult than TGT 
had been used for the adult years, IQ change 
could have been defined in an absolute sense, 
since the meaning of high scores would have 
been less equivocal. As it is, the results of this 
study have a delimited frame of reference. 
They apply to IQ change in a relative sense, 
ie., IQ change is measured in terms of the 
individual’s shifting his position—up or 
down—within this particular group of sub- 
jects. 

Since no psychologist expects any longer 
to substantiate a monolithic constancy of IQ, 
interest has turned to identifying factors that 
are associated with its fluctuations. One such 
group of variables that has been studied may 
be grossly categorized as being associated 
with the inherent processes of hereditary 
mental growth patterns, which are sometimes 
assumed to have physical correlates. Sontag 
and his associates (1958) have been unable 
to find relationships between a number of 
physical growth factors and IQ change; the 
present study has not been concerned with 
this possibility at all. Bradway and Robin- 
son (1961), after completing their 25-year 
follow up, have come to the conclusion that 
their ancestral index (based upon parental 
intelligence and grandfather’s occupational 
intelligence) has no important effect on IQ 
change after the onset of adolescence. They 


had previously reported significant relation- 
ships between IQ change and the ancestral 
index for younger age groups. In the present 
study a somewhat similar variable, the social 
class of origin, has been shown to have no 
systematic relationship to acceleration or de- 
celeration of IQ. 

The present paper, however, has focused 
on the relationships of IQ change to per- 
sonality variables and is based upon the as- 
sumption that IQ change (and in a broad 
sense IQ itself), as it can be measured by 
psychologists, is a personality variable. The 
Menninger Coping Project has been concerned 
with the coping response patterns of their 
preschool subjects to the demands of intelli- 
gence testing situations: Moriarty (1961) 
has reported relatively low correlations be- 
tween overall coping capacity and IQ (r = .26 
for girls; r = .20 for boys) and similarly low 
relationships between specific coping patterns 
and IQ. Although some of the relationships 
between coping mechanisms and IQ in the 
present study are of the same magnitude as 
those reported by Moriarty, a number are 
greater. It may well be that the older subject’s 
increased internal maturity and accumu- 
lation of experience with societal expecta- 
tions result in more differentiated relation- 
ships between formal intelligence and coping 
functions. Thus, while a highly motivated 
and organized approach to an intelligence test 
may very well reflect a coping effort in the 
case of an adult, the same approach on the 
part of a preschool child may reflect a neu- 
rotic compulsive precocity since his motiva- 
tion for doing well on this kind of a test 
cannot be expected to be entirely self-defined. 

Personality-intelligence findings are always 
confounded by the sequential question: did 
the raters tend to give higher coping ratings 
to subjects because they were more intelli- 
gent and higher defense ratings to subjects 
because they were less intelligent, or is cop- 
ing or defensiveness the primary factor, pro- 
ducing, in its wake, a higher or lower IQ? 
Although the relationships of the absolute 
level of IQ to ego mechanisms tend to be ob- 
scured by this methodological problem, the 
relationships of the ego mechanisms to IQ 
change are pure in this regard: all levels of 
absolute ability are conglomerated under the 


Eco FUNCTIONING IN RELATIONSHIP To IQ CHANGE 21 


general rubric of change. Consequently the 
relationships of personality variables to IQ 
change are not confounded with the absolute 
level of intelligence. 

After their investigation Sontag, Baker, 
and Nelson (1958) came to the conclusion 
that the theoretical construct of intelligence is some- 
what artificial in nature, reflecting not only motiva- 
tional aspects of content learning, but also reflecting 
another level of learning which has been termed 
“learning to learn” [p. 137]. 


The Sontag variables (independence, aggres- 
siveness, competitiveness, anxiety, etc.), how- 
ever, tap very different aspects of personality 
from the ego mechanisms. Genetic explora- 
tions utilizing these two different formulations 
would lead to different questions and, there- 
fore, to different answers. The ego mecha- 
nisms have a theoretical basis, and they are 
represented in this paper by explicit judg- 
ments on the part of skilled clinical judges 
when they were asked to attend specifically 
to one aspect of personality structure—the 
ego. However, the conceptualization of ego 
structure, as it has been historically defined, 
and specifically defined in this paper, likewise 
implies a function, that of modulating behav- 
ior irrespective of its content. The ego mecha- 
nisms, then, are functional or process varia- 
bles and the rating does not reflect the extent 
of an individual’s aggressiveness, but rather 
the extent to which he uses one or another 
process in handling his inevitable aggressive- 
ness, e.g., does he displace it, deny it, subli- 
mate it, or suppress it? The model avoids an 
inherent and self-limiting aspect of the Sontag 
scheme; aggressiveness, independence, self- 
initiation, etc., were all found to be character- 
istic of accelerators, but there may very well 
be a point of diminishing returns: a great 
deal of childhood aggressiveness probably does 
not lead to a great deal of intellectual gain. 
The difficulty is demonstrated by another 
paper from the Fels Institute (Kagan et al., 
1958): After finding a significantly greater 
amount of aggressive content in Rorschachs 
of boys who accelerated in IQ, Kagan and 
his associates note that Rorschach content per 
se cannot be understood to be a direct measure 
of the quantity of aggressivity, and he cau- 
tions against such conclusions. However, the 
conceptual framework employed appears to 


produce the necessity for the caution. Be- 
cause the ego mechanism variables are process 
variables at a single level of inference, and 
health and pathology are explicitly defined, 
their interpretation should be somewhat less 
equivocal. ¢ 

Since the Fels conception of IQ accelera- 
tion as a matter of “learning to learn” is ge- 
nerically related to reinforcement theory, one 
may well ask how is a child taught to learn 
to learn. If coping mechanisms lead to IQ 
acceleration and defense mechanisms lead to 
deceleration, as the present study suggests 
they generally do, then the ego mechanism 
model would ask the genetic question as to 
what experiences enhance the freedom of the 
child for coping and prevent the development 
of defenses which close off the individual from 
experience and intellectual growth? 

The results of the present paper are, of 
course, in need of replication, as they fre- 
quently involve small groups and stand more 
or less without comparability. Future work in 
this area would benefit by attention to the 
varying kinds of intellectual performance, The 
results of the present study indicate that 
different processes may be involved in verbal 
and numerical tasks. The relationships found 
between femininity and its changes and IQ 
change suggest that attempts to relate per- 
sonality to IQ change would do well to con- 
sider changes in personality variables over 
time, as well as changes in IQ. Previous 
studies—and much of the present analysis— 
have related IQ change (a measure observed 
over time) with personality variables (ob- 
served once only); this failure to compare 
IQ change with personality change now ap- 
pears a methodological weakness. 


SUMMARY 


The patterns of coping and defense mecha- 
nisms that relate to the absolute level of in- 
telligence in adolescence and to IQ change 
between early adolescence and middle adult- 
hood have been studied with a longitudinal 
sample, the subjects of OGS. Intelligence was 
found to be positively related to coping 
mechanisms and negatively related to de- 
fense mechanisms for both men and women. 
Two kinds of IQ change were studied, verbal 


22 4 Norma Haan 


and arithmetical, and somewhat different re- 
sults were obtained for each. However, cop- 
ing is generally related to IQ acceleration and 
defense to deceleration. Men were generally 
more accelerative in the various kinds of 
intelligence than women. Male accelerators 
‘were generally coping in an expressive man- 
ner, whereas women accelerators were cop- 
ing in a controlled manner but had neurotic 
types of defenses as well. Men and women 
who accelerated or decelerated in an arithmetic 
subtest were more alike in ego functioning 
than the sexes were generally. Men who de- 
celerated on a verbal subtest appeared cogni- 
tively similar to the women and, additionally, 
became more defensive; whereas women who 
decelerated were considerably more healthy 
than their male counterparts. The relation- 
ship of IQ change to femininity and mascu- 


linity during adolescence and adulthood and 
the kind of change in sex roles between these 
two periods of life were also investigated. 
The results for males were inconclusive, but 
it was found that women who accelerated the 
most in verbal tasks tended to be low in 
femininity in adolescence and high in femi- 
ninity in adulthood; women who showed the 
greatest acceleration in arithmetic tended to 
have the reverse pattern—high in femininity 
in adolescence and low in femininity in adult- 
hood. In still another analysis, the amount 
of ego control was held constant by the use 
of an independent measure; denial and a 
primitive defense factor were also found to 
characterize both men and women who de- 
celerated on the verbal tasks. Neither ado- 
lescent nor adult social class level was found 
to be importantly related to IQ change. 


REFERENCES 


ArLrorT, G, W. Personality: A psychological inter- 
pretation. New York: Holt, 1937. 

Bayrey, Nancy. Consistency and variability in the 
growth of intelligence from birth to eighteen years. 
J. genet. Psychol., 1949, 74, 165-196. 

Brock, J. On the number of significant findings to 
be expected by chance. Psychometrika, 1960, 25, 
369-380, 

Brock, J. An Ego Control scale. Unpublished manu- 
script, Institute of Human Development, University 
of California, Berkeley, 1961. 

Brapway, KATHERINE, & Rosinson, Nancy. Signifi- 
cant IQ changes in twenty-five years: A follow-up. 
J. educ. Psychol., 1961, 52, 74-80. 

Epwarps, ALBA, A social and economic grouping of 
the gainful workers of the United States. J. Amer. 
Statis. Ass., 1933, 28, 377-387. 

FENICHEL, O. The psychoanalytic theory of neurosis. 
New York: Norton, 1945. 

FrENKEL-BRUNSWIK, Erse. Motivation and behav- 
ior. Genet. Psychol. Monogr., 1942, 26, 121-265. 

Freup, Anna. The ego and the mechanisms of de- 
fense. London: Hogarth, 1937. 

GULLIKSEN, H. Theory of mental tests. New York: 
Wiley, 1950. 

Haan, Norma. Some comparisons of various Oakland 
Growth Study subsamples on selected variables. 
Unpublished manuscript, Institute of Human De- 

velopment, University of California, Berkeley, 1962. 

Hartmann, H. Psychoanalysis and the concept of 
health. Int. J. Psychoanal., 1939, 20, 308-321. 


Harrmann, H. Notes on the theory of sublimation, 
Vol. X. Psychoanal, Stud. Child, 1955, 10, 9-30. 
Jones, H. E., MACFARLANE, JEAN W., & EICHORN, 
Dororny H. A progress report on growth studies 
at the University of California. Vita hum., 1959, 

3, 17-31. 

Kacan, J., Sontac, L. W., NELSON, VIRGINIA L, & 
Baxer, C. T. Personality and IQ change. J. 
abnorm. soc. Psychol., 1958, 56, 261-266. 

Lamer-Dr-Groot, Jeanne. On defense and develop- 
ment: Normal and pathological. Psychoanal. Stud. 
Child, 1957, 12, 127-150. 

Mutter, D., & Swanson, G. E. Inner conflict and de- 
fense. New York: Holt, 1960. 

Mortarty, Atice. Coping patterns of preschool chil- 
dren in response to intelligence test demands, Genet. 
Psychol. Monogr., 1961, 64, 3-128. 

Murpxy, Lois B. Coping devices and defense mech- 
anisms in relation to autonomous ego functions. 
Bull. Menninger Clin., 1960, 24, 144-153. 

Murray, Lors B. The widening world of childhood: 
Paths toward mastery. New York: Basic Books, 
1962. 

Rapaport, D, The autonomy of the ego. In R. 
Knight & C. Friedman (Eds.), Psychoanalytic 
psychiatry and psychology. New York: Inter- 
national Universities Press, 1954. Pp. 248-259. (a) 

Rapaport, D. On the psychoanalytic theory of 
thinking. In R. Knight & C. Friedman (Eds.), 
Psychoanalytic psychiatry and psychology. New 
York: International Universities Press, 1954. Pp. 
259-273. (b) 


Eco FUNCTIONING IN RELATIONSHIP TO IQ CHANGE 23 


Sontac, L. W., BAKER, C. T., & NELSON, Vircinta L. 
Mental growth and personality development: A 
longitudinal study. Monogr. Soc. Res. Child 
Develpm., 1958, 23(2, Whole No. 68). 

Srewart L. Social and emotional adjustment during 
adolescence as related to the development of psy- 
chosomatic illness in adulthood. Genet. Psychol. 
Monogr., 1962, 65, 175-215. 


Swanson, G. E. Determinants of the individual's de- 
fenses against inner conflict: Review and reformu- 
lation. In J. C. Glidewell (Ed.), Parental attitudes 
and child behavior. Springfield. Charles C Thomas, 
1961. Pp. 5-41. 


(Received March 8, 1963) 


Vol. 77, No. 9 


Psychological Monographs: General and Applied 


aaae 


HERITABILITY OF PERSONALITY 
A DEMONSTRATION + 


IRVING I. GOTTESMAN 2 
Harvard University 


Ss were 68 pairs of adolescent twins from the public schools of an urban area, 
They comprised 60% of the same-sexed twin population. Zygosity diagnosis 
was determined by blood grouping for 9 groups; half of the sample was identi- 
cal (MZ) and half was fraternal (DZ). Personality was assessed by the MMPI 
and Cattell’s HSPQ. 24 standard scales were analyzed by intraclass correla- 
tions (R) and heritability indexes (H). Holistic analyses of MMPI profile 
similarity were done clinically and statistically. 6 personality scales had signifi- 
cant genetic components as revealed by higher MZ Rs. 11 scales had apprecia- 
ble hereditary variance, Pooled accuracy of profile similarity judgments match- 
ing zygosity was 68% (p = .005). The general idea that psychopathology 
in man has a substantial genetic component, especially the psychoses, was sup- 
ported. A dimension of introversion was the most heavily influenced by 


Whole No. 572, 1963 


genetic factors. 


Mae in general and life scientists in 
particular have long been curious 
about the basic nature of man. In recent years 
curiosity and speculation have given way to 
experimentation and controlled observation. 
One of the eventual outcomes of such research 
will be a knowledge of the sources of the in- 
dividual differences in human behavior so that 
the variation may be explained, predicted, or 
controlled. The genetic source of variation in 
human personality is the focal concern of the 
present research. Genetic is used in the strict 
sense to refer to that science launched by 
Mendel’s work with peas and not, as is com- 
mon in psychology, as an abbreviation for 
ontogenetic. Although it is axiomatic that the 


1This paper is based upon a doctoral dissertation 
submitted in partial fulfillment of the requirements 
for the PhD degree at the University of Minnesota, 
1960. 

I wish to express my appreciation to my adviser 
and mentor, R. D. Wirt. I was fortunate to have also 
had the encouragement and friendship of S. C. Reed, 
Director of the Dight Institute for Human Genetics, 
University of Minnesota, Minneapolis. 

The cooperation of the Minneapolis, Saint Paul, 
and Robbinsdale, Minnesota, public school systems 
through the efforts of H. Cooper, N. C. Kearney, 
and F. C. Gamelin, Assistant Superintendents, is 
gratefully acknowledged. A number of other persons 
contributed their skills and support to this study. 
Among them are Marianne Briggs, Joan Drues, A. C. 
Wahl, and Jane Swanson. Conversations with 
D. Freedman, R. Rosenthal, and D. S. Jones helped 


phenotypic expression of a trait is dependent 
upon the resultant of interaction between geno- 
type and environment, much heat and little 
light has been generated by attempts to an- 
swer the question, “How much of Trait X is 
due to heredity and how much to environ- 
ment?” Since neither agent alone can produce 
the observed behavior, the question as stated 
is meaningless. The question has precise mean- 
ing only when framed in terms of the variation 
between individuals, Two answerable ques- 
tions should be posed in the nature-nurture 
issue: (a) How much of the variability ob- 
served within a group of individuals in a speci- 
fied environment on a specific measure of a 
specific trait is attributable to genetic factors? 
(b) How modifiable by systematic environ- 


me to clarify a number of my thoughts. I benefited 
from the comments of S. G. Vandenberg and W. R. 
Thompson on this revised manuscript. Other indi- 
viduals are given credit in later pages for their spe- 
cific contributions. 

Funds in support of this research were provided 
by the Tozer Foundation of Stillwater, Minnesota, 
and the Dight Institute for Human Genetics. The 
large expense of blood typing was borne by the Min- 
neapolis War Memorial Blood Bank through the 
interest of G. A. Matson. Subsequent analyses of 
the data for sex differences were financed by a 
grant from the Laboratory of Social Relations, 
Harvard University. Preparation of this monograph 
was aided in part by Grant M-5384 from the National 
Institutes of Mental Health. 

2 Formerly at the University of Minnesota. 


2 Irvinc I. GOTTESMAN 


mental manipulation is the phenotypic expres- 
sion of each genotype? The answers to these 
questions will vary according to age, sex, cul- 
ture, trait, and method of assessment. The 
importance of genetic factors will range from 
almost none to overwhelming for those indi- 
viduals who have been found to have either 
more or less than 46 chromosomes (Lejeune & 
Turpin, 1961). 

The genes exert their influence on behavior 
through their effects at the molecular level of 
organization. Enzymes, hormones, and neu- 
rons may be considered as the sequence of 
complex path markers between the genes and 
the aspects of behavior termed personality. 
The inability of behavioral genetics to demon- 
strate this type of reduction at the present 
time need be no more embarrassing than the 
lack of information concerning the biochemi- 
cal changes associated with habit formation 
is to the psychology of learning. It may be 
that measures of behavior qua behavior are 
the only reliable indicators of certain kinds of 
genetic differences (Fuller, 1957; Fuller & 
Thompson, 1960). For our purposes the best 
way to conceptualize the contribution of he- 
redity to a personality trait is in terms of 
heredity’s determining a norm of reaction 
(Dobzhansky, 1955) or of fixing a reaction 
range (Gottesman, 1963). Within this frame- 
work a genotype determines an indefinite but 
circumscribed assortment of phenotypes, each 
of which corresponds to one of the possible 
environments to which the genotype may be 
exposed. Allen (1961) pointed out that the 
most probable phenotype of some genotypes 
may be such a deviant one that even the most 
favorable of currently known environments 
would not suffice to bring it within the normal 
range. 

Within the broad context of evolution the 
demonstration of heritable components for 
personality traits involves more than an aca- 
demic exercise. The one nonrandom genetic 
process which accounts for the adaptive orien- 
tation of evolution is differential success in 
reproduction, If there are heritable aspects to 
some personality traits and if there is assorta- 
tive mating for these traits, the frequencies of 
the associated genes will increase in the gene 
pool of the population. Tryon (1957) has 

proposed a behavior genetics model of society 


which suggests that relative reproductive iso- 
lation between social strata plus social mo- 
bility could account for some of the class 
differences observed in achievement and 
personality. 

In the brief history of contemporary psy- 
chology the search for genetic aspects of per- 
sonality has been dominated by an under- 
standable emphasis on mental illness and 
mental deficiency (Allen, 1958; Kallmann, 
1959). Almost all of such research has oc- 
curred within a context of classical Men- 
delian major gene mechanisms. The inappro- 
priateness of this model for the observed 
quantitative variation in normal personality 
traits has been one of the inhibitors to inves- 
tigations by psychologists. Although the clas- 
sical study by Newman, Freeman, and 
Holzinger (1937) focused on intelligence and 
achievement, a number of the personality 
tests then available were included in the 
battery administered to their large sample of 
twins. From their results the authors con- 
cluded, 

The only group of traits in which identical twins 


are not much more alike consists of those commonly 
classed under the head of personality [p. 352]. 


With very few exceptions (Cattell, Blewett, & 
Beloff, 1955; Vandenberg, 1962) this conclu- 
sion appears to have been accepted as a valid 
statement of the relationship of genetics to 
normal personality. Improvements in per- 
sonality measurement (Cronbach, 1960) and 
a new era of sophistication about the construc- 
tion and application of psychometric devices 
(Cronbach & Meehl, 1955; Loevinger, 1957) 
make it possible for us to re-examine the 
relationship in question. This study is an 
effort in this direction. 

The present research attempted to improve 
upon the methodology of previous twin studies 
by incorporating a number of refinements. 
The sample was selected after the entire 
population of same-sex twins had been 
enumerated in a large area; the represent- 
ativeness of the sample to the population was 
then ascertained. Accuracy and objectivity of 
zygosity diagnosis were ensured by means of 
serological tests for 46 different phenotypes 
in nine different blood group systems and by 
the use of fingerprints, height, and photo- 
graphs. Two types of objectively scored 


HERITABILITY OF PERSONALITY 3 


personality inventories were used, each pur- 
porting to be comprehensive but parsimonious 
in its characterization of the personality do- 
main. Unlike the tests which historically have 
been used in an effort to elucidate the “nature 
and nurture” of personality, both types of tests 
in the present research have a recognized 
claim on construct validity (Cronbach, 1960, 
p. 122). One test depends upon the process 
of factor analysis for the derivation of “pure” 
trait measures while the other stems from item 
analyses which separate criterion groups on 
various empirical dimensions. 

A primary goal of the present research is 
to answer the first question posed above about 
how much of the variability on various traits 
for a specified sample reared in a particular 
environment is attributable to genetic factors 
and how much to environmental. Earlier 
criticisms of this strategy (Anastasi, 1958; 
Loevinger, 1943) are well taken but do not 
render it obsolete. Fuller (1960) has noted 
that, while the question has no significance 
for an individual, the contribution of heredity 
to total variance in a population 


is still a useful object of inquiry, though with in- 
creased sophistication we have come to see that the 
answer to “How much?” is not a universal constant 
[p. 43]. 


PROCEDURE 


This section gives the details of twin selection as 
well as a description of the first application in psy- 
chological research of the Smith and Penrose (1955) 
method for the determination of zygosity by ex- 
tended blood grouping. The efficiency of this method 
is compared with the methods of the past. Only 
questionnaire measures of personality were used so 
as to facilitate the handling of scale scores within 
the context of quantitative genetics; the tests and 
their validity and reliability are described. A brief 
rationale of the twin method is presented followed 
by a description of the scheme of data analysis. 


Selection of the Twin Sample 


In sound twin methodology, it is essential that the 
sample be a miniature of the population of twins. 
This ensures proportional representation of the two 
kinds of twins (identical or MZ and fraternal or DZ) 
and allows accurate genetic analysis with the com- 
puted heritability indexes or concordance rates. It is 
also essential that the two groups of twins in the 
sample be matched on as many variables as possible 
so that differences in variance cannot be attributed 
to differences in age, sex, intelligence, socioeconomic 


status, or other factors which may influence person- 
ality other than the independent variable, genotype. 

All class cards for the over 31,000 children in 
public school Grades 9 through 12 in the cities of 
Minneapolis, Saint Paul, and Robbinsdale, Minne- 
sota, were examined. All pairs of children with the 
same last name, same sex, same address, and same 
birthdate were recognized as the twin population 
available. Opposite-sexed fraternal twins were not 
included in the study in order to eliminate the 
questionable procedure of comparing a boy with a 
girl on the same personality traits. 

The best data available to date about the incidence 
of twin births in the United States “white” popula- 
tion are those of Strandskov and Edelen (1946) 
who reported that 1.129% of all births are twin 
births. Of these, one third are opposite-sexed fra- 
ternals, one third are same-sexed fraternals, and 
one third are identicals. The best known twin studies 
(Kallmann, 1946; Newman et al., 1937) assumed 
that only one fourth of all twins are identical. 

A total of 163 pairs of same-sex twins were lo- 
cated in the schools’ files of 31,307 children, Based 
on the incidence of twin births, 1.129%, the expected 
incidence of same-sex twins would have been 237 
pairs. After calculating neonatal mortality, however, 
Allen (1955) found that the incidence of twins at 
1 month of age had already been lowered to .87 
pairs per 100 children (.87%). Based upon this in- 
cidence, the expected number of same-sex twin pairs 
in the entire population would have been 182. After 
1 month of age, the mortality of twins is the same 
as that of single born survivors. By subtracting the 
known mortality rate in the general population for 
children reaching the age of 15 (5%), the final ex- 
pected number of same-sex twin pairs was 173.3 At 
the time the present study was conducted, there were 
no adolescent twins in either the correctional or 
mental institutions (not counting the two housing 
mental defective and brain damaged cases) of the 
state. It would appear that virtually the entire 
population of same-sex adolescent twins in the 
public high schools of the three communities was 
enumerated. 

The parents of all pairs in Minneapolis and Rob- 
binsdale and of those in the largest high school in 
Saint Paul were sent a letter describing the project 
and a return postcard on which was printed a medi- 
cal release authorizing blood typing. After 10 days, 
telephone calls were made to those who had not 
returned the card indicating the voluntary partici- 
pation of their children. After another 10 days, a 
second and last, hopefully persuasive, telephone call 
was made. These efforts secured the initial coopera- 
tion of 26 pairs of boys and 48 pairs of girls, By 
the end of the study 6 pairs of twins had defaulted 
for various reasons: 1 pair was lost as a result of 
fear of the intravenous removal of the blood speci- 
men, one member of another pair had cerebral palsy 
and could not take the personality tests in the stand- 


3 Evaluation of the sampling adequacy was sug- 
gested by and facilitated by E. Anderson. 


4 Irvine I. GOTTESMAN 


ard manner, and 4 pairs were unavailable at the 
times provided for the tests which were Saturday 
afternoons and mornings. 

The final study sample, then, consisted of 23 pairs 
of boys and 45 pairs of girls. These 68 pairs, dis- 
regarding sex, represented 60.2% of the total possible 
113 pairs in the schools sampled. The sample con- 
tained, respectively, 43.4% and 75.0% of the male 
and female twin pairs. The sample of the present 
study compares favorably in size with the majority 
of twin studies reported in the psychological litera- 
ture. In representativeness, it is superior to the ma- 
jority. The simplest explanation for the preponder- 
ance of girls over boys is the reluctance of adolescent 
boys to volunteer their spare time for taking paper- 
and-pencil tests, especially on Saturdays. The children 
came from 13 different high schools (all that were 
sampled), some of which included a ninth grade, 
and 5 different junior high schools. Participation 
ranged from 8 out of 8 pairs to 3 out of 8 in the 
high schools. There was a tendency toward better 
participation as the economic level of the neighbor- 
hood increased. 

After the parents of a twin pair had returned the 
signed authorization for participation and blood 
typing, an appointment was made to drive the pair 
to the Minneapolis War Memorial Blood Bank. An 
appointment was then made for the personality tests. 
The children were tested in small groups ranging up 
to 12 pairs. At the time of testing, the children filled 
out a personal history data sheet, the Minnesota 
Multiphasic Personality Inventory (MMPI), and the 
High School Personality Questionnaire (HSPQ; 
Cattell, Beloff, & Coan, 1958); they were weighed, 
measured for height, fingerprinted, and photo- 
graphed. The entire procedure usually took between 
3 and 4 hours for each group. 

John D. Douthit, Identification Officer with the 
Minnesota Bureau of Criminal Apprehension, finger- 
printed about half of the twins and, after being 
tutored in the technique, the author fingerprinted 
the rest. The Faurot inkless method was used with 
acceptable results and a considerable saving of time. 
It makes use of a colorless fluid and chemically 
sensitive paper. An ordinary bathroom scale was 
used for weighing. Height measurement is estimated 
to be correct within 5 millimeters, Photography was 
done by the author with a 35 mm. camera; both 
a front view and a profile were shot of the head 
and shoulders. 

Some descriptive characteristics of the sample are 
presented in Table 1. By a serological procedure 
described in the next section, the 68 pairs of twins 
were classified into 34 pairs of MZ and 34 pairs of 
DZ. That this split corresponds to genetic theory 
is a stroke of luck and an illustration of the rep- 
resentativeness of the sample. About 90% of the 
twins reported they were of Scandinavian or Western 
European extraction. In addition to obtaining Otis 
IQs on the sample, the same data were obtained for 
30 more pairs of twins who had not volunteered in 
Minneapolis and Robbinsdale so that any selection 
for intelligence might be revealed. It should be noted 


that the total of 98 pairs accounts for information 
about the IQs of 86.7% of the population of same- 
sex twins in the schools used. A sampling bias was 
revealed by the fact that the mean IQ of the non- 
sample twins was 97, while those of the MZ and 
DZ samples were 105 and 108, respectively. A 
t test for the significance of these differences showed 
that both study samples were significantly different 
from the nonvolunteers (t=2.66, p < .01; t=3.66, 
p < .001). 


Criteria for the Diagnosis of Zygosity and an 
Evaluation of their Relative Efficiencies 


One of the most serious criticisms of much twin 
research is the inaccuracy of zygosity diagnosis. In 
reaching a judgment in the past, reliance has been 
placed on an evaluation of the type of birth mem- 
brane or the degree of physical resemblance between 
the twins. Diagnoses based upon the birth mem- 
branes are unreliable because while MZ twins are 
always monochorionic (i.e, a single membrane sur- 
rounding both fetuses), the presence of two chorions 
is known to occur with both MZ and DZ twins 
(Stern, 1960, p. 536). In addition, when studying 
adult or adolescent twins, it is difficult to obtain ac- 
curate information about the birth membrane. Diag- 
nosis by means of the placenta is even more unrelia- 
ble. In evaluating the extent of physical resemblance, 
geneticists have used such traits as sex, height, weight, 


TABLE 1 
DESCRIPTIVE CHARACTERISTICS OF THE SAMPLE 


Character MZ DZ Combined 
Pairs of boys 12 11 23 
Pairs of girls 22 23 45 

Age 

14 0 2 2 
15 15 4 19 
16 9 13 22 
17 7 10 17 
18 3 5 8 
Grade 
9 7 5 12 
10 15 11 26 
11 7 11 18 
12 5 7 12 
Level of paternal occupation ® 

I and II 15 9 24 

Tl 7 13 20 

V and VI 12 12 24 
M Otis 1Q 105 108 107 
IQ SD 12 12 12 


* The Minnesota Scale for Paternal O i itute 
of Child Welfare, University of MnO a Gra 


HERITABILITY OF PERSONALITY 5 


eye color, hair color and form, familial appearance, 
and various types of fingerprint or palmprint analy- 
ses. Although there is an unavoidable subjective 
element in evaluating many of these characteristics, 
one expert has estimated the error to be no greater 
that 1 in 10 (Newman, 1940). That this estimate 
may be in error is demonstrated below. 

If twins differ in sex or any other known inherited 
characteristic, they cannot be MZ twins. However, 
if the characteristics are alike, the possibility still 
remains that the twins are DZ. Given a number 
of simply inherited and widely distributed traits, it 
is possible to state the probability of monozygosity 
or dizygosity for a given pair of twins. It is to be 
noted, however, that all such diagnoses of mono- 
zygosity, no matter how many characteristics are 
identical, will always be statements of probability; 
that is, the probability of sharing the given number 
of traits in common. 

Numerous criteria were examined in this study 
with the hope that the various suggestions in the 
literature for the diagnosis of zygosity might be 
objectively evaluated against the recognized best 
method of extensive blood typing recently quantified 
by Smith and Penrose (1955). Thus blood type 
alone was compared with blood type combined first 
with height, second with a difference in total finger- 
print ridge count, and then with both height and 
ridge count. The accuracy of fingerprints alone and 
height alone was ascertained. In addition, three 
groups of judges, geneticists, psychologists, and art- 
ists, looked at photographs of the twins and made 
another form of judgment. 

All blood specimens were drawn and typed by the 
Minneapolis War Memorial Blood Bank, Incorpo- 
rated. The following blood group systems (Race & 
Sanger, 1958) were used; ABO, MNSs, Rh, P, 
Lutheran (Lu), Kell (K), Duffy (Fy), Kidd (Jk), 
and Lewis (Le). 

Smith and Penrose (1955) tabulated the proba- 
bilities used for an objective determination of the 
likelihood of dizygosity based on the incidence of 
phenotypic sib-sib concordance for the above blood 
groups. A specific example will illustrate the basic 
principle underlying the origin of the tabulated prob- 
abilities. Blood typing of a large Caucasian popula- 
tion in Great Britain shows that the frequency of 
Type B blood is .084509 and the frequency of two 
sibs being B is .040062. The probability that if one 
of two sibs is B the other is also becomes .040062/ 
.084509 or 4741. Since DZ same-sex twins are ge- 
netically as similar as ordinary sibs, the probability 
of DZ twins both being Type B is equally .4741. 
Probabilities derived in this manner are listed in the 
upper part of Table 2 together with the initial proba- 
bility that Caucasian, United States twins are DZ 
and that of a DZ pair being the same sex. Multipli- 
cation of all these independent probabilities results 
in the probability of finding twins who are DZ and 
alike in all the gene loci involved. A more extensive 
discussion of the Smith and Penrose method together 
with the rationale for the derivation of probabilities 
for the morphological traits used can be found else- 
where (Gottesman, 1960). 


TABLE 2 


EXAMPLE OF THE SMITH AND PENROSE METHOD FOR 
Zycostry DETERMINATION 


Independent 

relative 

Character chance 
Initial odds 1.9246 
Likeness in sex .5000 
Likeness in ABO 6891 
Likeness in MNSs 4556 
Likeness in Rh 5021 
Likeness in Le 8681 
Likeness in K 9485 
Likeness in Fy 8036 
Likeness in Jk 8531 
Likeness in Lu 9614 
Likeness in P 5699 
Total relative chance pz (blood) 0470 
Total chance pnz/(1-+-poz) 0448 
Difference in ridge count 2288 


Total relative chance (blood +ridges) .0107 
Total chance 0106 
Difference in stature A671 
Total relative chance (blood-stature) ¿0219 
Total chance 0214 
Total relative chance (all of above) 0050 
Total chance .0050 


As a result of the blood typing, 34 pairs of twins 
were diagnosed as definitely DZ, that is, they differed 
on at least one of the independently inherited blood 
groups. Using only blood, the remaining 34 pairs were 
diagnosed as MZ with the probability of accuracy 
no less than 95 times in 100. Table 3 summarizes 
the results of the accuracy for the various combina- 
tions of blood and physical characteristics. It seems 
paradoxical that while additional information in- 
creased the accuracy of some of the diagnoses, it 
was at the expense of serious errors, e.g., using all 
the characters resulted in calling 22 pairs MZ at the 
01 level or better, but at the expense of 6 pairs fail- 
ing to meet the criterion of the .05 level. The primary 
reasons for this are the lack of cross validation and 
the small samples upon which the fingerprint and 
height probabilities are calculated, 52 and 50 pairs of 
MZ, respectively. This resulted in a range of within- 
pair differences too narrow to allow for those found 
in the present sample of MZ twins. The probability 
figure given was too much in the DZ direction to 
be overcome by any amount of additional informa- 
tion, Eight pairs of MZ twins had differences of 
ridge count which were tabulated at probabilities 
greater than 1.0 in favor of the DZ contingency. 
Similarly, five pairs were “penalized” for differences 
in height larger than the tabulated ones for the 50 
criterion pairs of MZ twins on which they were 
based. Differential growth rates during adolescence 


6 Irvine I. GOTTESMAN 


may have been another attenuating factor in the use 
of the probabilities attached to differences in height. 

Let us turn now to two different analyses of the 
fingerprints: one clinical and the other statistical. 
Given the 68 pairs of fingerprints and no information 
as to the base rates, ie., incidence of DZ and MZ 
twins in the sample, how accurately can an expert 
diagnóse the two kinds of twins? Douthit undertook 
this task and was able to correctly identify 30 MZ 
pairs and 23 DZ pairs. Most important in his clinical 
decisions were three components (Cummins & Midlo, 
1943): differences in pattern slope for the same finger 
in a twin pair, similarities in slope but different 
patterns, and differences in the range of dermal 
ridges for paired fingers. His decisions were not 
purely clinical in the Meehl (1954) sense of the word 
in that he subjectively assigned different weights to 
these components, and used the scores a pair of prints 
obtained. The statistical method used was for the 
author to assign what appeared to be the optimum 
cutting score (Meehl & Rosen, 1955) to the distri- 
bution of differences between total ridge count. This 
cutting score was then cross validated by applying it 
to the original distribution (Smith & Penrose, 1955) 
from which the aforementioned probabilities were 
determined. A cutting score of 30 classified 33 of 
34 MZ pairs and 20 DZ pairs correctly. This score 
correctly classified 51 of the original 52 MZ pairs at 
the expense of misclassifying 39 of 101 (38.6%) like- 
sex siblings. The clinical and statistical methods tied 
in their accuracy for diagnosing the entire present 
sample with both hitting 78%. Both Newman, 
Freeman, and Holzinger (1937) and Slater (1953) 
make use of some aspects of fingerprints in their 
diagnosis of zygosity. 

Judgments of photographs constituted the final 
method of zygosity determination evaluated in this 
section. A summary of all the methods attempted 
is then presented. Three groups of judges were uti- 
lized; three geneticists, three child psychologists, and 
three artists,t Although the front and profile pictures 
of the head were black and white 35 mm, contact 
prints, expressions of dissatisfaction with their 
quality were minimal. 

It is obvious that previous estimates (e.g., New- 
man, 1940) of a 10% error in the diagnosis of 
zygosity by general appearance are subject to doubt. 
Jackson’s (1960) contention that he observed a 
“striking difference” between photographs of MZ and 
DZ twins in the literature needs to be re-evaluated. 
Even allowing for the quality of photographs and the 
absence of the cues from the twins’ physical presence, 
the median accuracy of 72% for all nine judges was 
significantly less than an expected 90% (y2=24.33, 
< .001), Poor reliability of judgments may be 
inferred from the fact that for only 13 MZ pairs 
and 14 DZ pairs were there one or no inaccurate 
judgments, There was a total of 84 errors in judging 


4T am grateful for the assistance of Vivian Phillips, 
Elizabeth Reed, S. C. Reed, J. E. Anderson, Mildred 
C. Templin, R. D. Wirt, Carol Safer, L. Safer, and 
Ane Wolfe Graubard. 


the MZ twins and 70 errors in judging the DZ twins. 
Judging the MZ girls seemed to be the most difficult. 
To the extent that the data in Table 3 were stable, 
only the geneticists made sufficient allowance for 
the variability that existed between MZ twins. 


Summary 


For the sake of clarity, the accuracy of zygosity 
determination for all the methods described thus far 
is presented in Table 3. 

It should be noted that the three columns cannot 
be evaluated independently of one another. A judge 
of the photographs or fingerprints could maximize 
his accuracy in one category at the expense of the 
other. It is the final column which conveys the most 
meaning. The blanks in the table derive from the 
fact that the DZ twins were absolutely removed 
from further consideration by the blood typing 
methodology in the Smith and Penrose scheme, 


Psychometric Devices 


Both instruments used to measure personality in 
this study come under the category of objective as 
contrasted with projective tests. Within the former 
group there are two major types of questionnaires: 
one is derived empirically from its ability to dis- 
criminate among behavioral phenotypes and the 
scales may be said to have functional unity, and 
another type is derived by factor analysis and the 
scales may be said to have statistical unity. The 
MMPI was selected as an example of the first type 
and the HSPQ of the second. 

Widespread usage of the MMPI precludes the 
necessity for a detailed description (Dahlstrom & 
Welsh, 1960). The test was constructed to provide, 


TABLE 3 
Accuracy SUMMARY For METHODS or Zycosity 
DETERMINATION 
Method MZ% DZ% Total % 
Blood (p=.05) 100 100 100 
Blood+ Height 85 SE mes 
Blood+ Ridge count 88 — — 
Blood+ Height + Ridge 
count 82 — — 
Height + Ridge count (0) — — 
Fingerprints 
Clinical 88 68 78 
Statistical 97 59 78 
Photos 
Best geneticist 97 74 85 
Best psychologist 59 82 71 
Best artist 79 91 85 
Pooled judges (6/9 
agreement) 68 88 78 


HERITABILITY OF PERSONALITY 7 


in a single instrument, measures of all the more 
important phases of personality of interest to the 
psychiatrist. Items were selected from 26 subject- 
matter categories, e.g., general health, sensory dis- 
turbances, family problems, sexual and social atti- 
tudes, masculine and feminine interest patterns, and 
schizophrenic thinking disturbances. The 550 items 
were answered true or false, Use of the MMPI with 
normal adolescents may be questioned, but this 
practice is becoming more popular as experience 
accumulates with this age group (Hathaway & 
Monachesi, 1961; Wirt & Briggs, 1959). A basic 
assumption of the present research was that per- 
sonality scales represent dimensions or continua of 
behavior, not categories. Such scales thus lend them- 
selves to the assumptions of quantitative inheritance 
and permit the use of nonpsychiatric subjects. Num- 
bers have been assigned to the MMPI scales instead 
of their Kraepelinian based names by the developers 
of the test so as to facilitate fresh associations to 
the meaning of the scales as personality constructs. 
There is no simple translation from MMPI data into 
descriptive terms for normal populations, but ad- 
jectives associated with the behavior of subjects with 
different scale patterns are easily obtained (e.g. 
Black, 1953). 

The group form of the test was used and scored in 
the usual fashion for the four validity indicators and 
10 clinical scales, 1 through O (or Hypochondriasis 
—Hs—through Social Introversion—Si). In addi- 
tion, 6 experimental scales (Hathaway & Briggs, 
1957), Ego Strength (Es), Anxiety (A), Repression 
(R), Dominance (Do), Dependency (Dy), and 
Social Status (St) were scored and analyzed. The 
5 scales requiring a K correction were analyzed after 
the correction had been made. 

Test-retest reliability coefficients on the 10 stand- 
ard MMPI scales for a sample of 100 male and 
female college students after a 1-week interval 
(Dahlstrom & Welsh, 1960, p. 472) range from .56 
to .90, These particular coefficients were chosen be- 
cause they are most comparable to the circumstances 
under which the HSPQ reliabilities were computed. 

The HSPQ is new to the literature on personality 
tests and requires more exposition than the MMPI. 
Cattell, Beloff, and Coan (1958) constructed this 
instrument by factor analysis especially for adoles- 
cents 12 through 17 years in the tradition of the 
Cattell (1946, 1950) laboratory. It is said to cover 
all the major dimensions involved in any compre- 
hensive view of individual differences in personality 
(Cattell et al., 1958). 

Tt consists of 280 forced-choice items, all of which 
are scores, which form 14 independent, equal length, 
scales. Although printed in two forms of 140 items 
each, the authors® recommend the use of both to 
obtain sufficient reliability. It is also suggested that 
raw scores rather than standard scores be used for 
research purposes and this suggestion was followed. 
The scale designations and their titles are given in 


5R. B. Cattell, personal communication, February 
1959. 


Table 4. Test-retest correlations based on 112 chil- 
dren aged 13 through 15 tested 2 weeks apart with 
the full test range from .68 to .80. 

In the opinion of the constructors, validity for 
the 14 scales is satisfactorily established. The main 
technique used to demonstrate this is the computa- 
tion of a multiple correlation from factor-item cor- 
relations. This gives a median r of .81. Although 
no correction for Test-Taking Attitude is used, there 
are equal numbers of “yes” and “no” keyed answers 
on each scale, 

It should be obvious that the reliabilities of the 
MMPI and the HSPQ are on much firmer ground 
than the validities. Unfortunately, the magnitude 
of the test-retest correlations has no direct bearing 
on the construct validity of a scale (Loevinger, 
1957). An inherent difficulty in measuring person- 
ality traits is the observation that they change with 
the passage of time and with intervention. After 
the data of the present study are analyzed, there 
should be more evidence for the validity, or lack 
thereof, of the various scales. At the least one might 
expect significant correlations between MZ twin 
siblings. Another speculation would be the absence 
of negative corrélations between either class of twins 
unless there were some parsimonious explanation of 
a within-pair interaction on a trait. 


Twin Method 


Bacteria, fruit flies, and mice have contributed 
greatly to the body of genetic knowledge, but the 
application of this knowledge to the causes of varia- 
tion in human behavior raises difficulties. Moreover, 
relatively few direct methods are available to the 


TABLE 4 
HSPQ SYMBOLS AND TITLES ^ FOR Testr DIMENSIONS 


Symbol Low score High score 
A Stiff, Aloof Warm, Sociable 
B Mental Defect General Intelligence 
C General Neuroticism Ego Strength 
D Phlegmatic Tempera- Excitability 
ment 
E Submissiveness Dominance 
F Sober, S- “ious Enthusiastic 
G Casual, ' adependable Super Ego Strength 
H Shy, Se itive Adventurous, Thick- 
Skinned 
I Tough, <ealistic Esthetically Sensi- 
tive 
J Liking froup Action Fastidiously Indi- 
J vidualistic 
O Confident Adequacy Guilt Proneness 
Qə Group Dependency Self-Sufficiency ; 
Qs Uncontrolled, Lax Controlled, Showing 
Will Power 
Qı Relaxed Composure Tense, Excitable 


aA mixture of technical and popular terms was used here, 


8 Irvine I. GOTTESMAN 


researcher in human behavior genetics because of 
such problems as those introduced by uncontrolled 
mating, small numbers of offspring, heterogeneous 
environments, and the uniqueness of one individual’s 
heredity. Of the available methods, the twin method 
approaches the ideal experimental design. Galton 
(1875) first called attention to the possible usefulness 
of twins for casting light on the nature-nurture 
problem. The underlying principle is simple and 
sound: since MZ twins have identical genotypes, any 
dissimilarity between pairs must be due to the action 
of agents in the environment, either postnatally or 
intrauterine; DZ twins, while differing genetically, 
have certain environmental similarities in common 
such as birth rank and maternal age, thereby pro- 
viding a measure of environmental control not other- 
wise possible. When both types of twins are studied, 
a method of evaluating either the effect of different 
environments on the same genotype or the expres- 
sion of different genotypes under the same environ- 
ment is provided. This means, with respect to any 
given genetically determined trait, that there should 
be a greater similarity betwen MZ than between 
DZ twins. If both members of a twin pair develop 
the same phenotype in a given environment, they 
are called concordant for the trait under study; 
discordant is the designation for differing phenotypes. 
When dealing with a single gene difference, such as 
Huntington’s chorea, MZ twins should always be 
concordant; DZ twins may be either concordant or 
discordant. The expected difference in concordance 
can then be used to give a measure of heritability 
(H) of the trait if the traits are amenable to discrete 
classification. 

Inasmuch as few traits in the normal range of 
human personality are dichotomous, another ap- 
proach is needed to estimate H when traits are con- 
tinuous and the genetic component is of the poly- 
genic variety. In the present research, H will be 
defined as the proportion of total trait variance 
associated with genetic factors. Holzinger (1929) 
suggested that the best comparison to make in evalu- 
ating the nature-nurture interaction for a quantita- 
tive characteristic is that between the intraclass 
correlation coefficients (R) for MZ and like-sexed 
DZ twins. Holzinger’s H gives the proportion of 
variance produced by genetic differences within 
families. The method underestimates the effects of 
heredity in the general population by a factor of 
approximately 2 since the genetic variance is esti- 
mated from the genetic overlap between DZ twins 
which is .5, The index of heritability, k2, computed 
in animal behavior genetics (e.g., Falconer, 1960; 
McClearn, 1961) is different from H in that it is 
an estimate of the proportion of trait variance in a 
population determined by genotypic variation in that 
population, Both between- and within-families vari- 
ance components are used to compute h2. 

Holzinger gave two formulas for his estimate of 
heritability, one based on R and another, statistically 
equivalent, based on the within-pair variances. 


Br Pa 


a Voz 


where, 
Ruz = intraclass correlation between MZ twins 


Roz traclass correlation between DZ twins 

Vuz = within-MZ pairs variance estimate (mean 
square) 

Voz = within-DZ pairs variance estimate (mean 
square) 


Falconer has suggested that the difference between 
the MZ and DZ Rs could be taken as an estimate of 
half the heritability if there were no nonadditive 
genetic variance; since the latter assumption is proba- 
bly not warranted, the difference can only be 
regarded as setting an upper limit to half the 
heritability. 


Limitations and Criticisms of the Twin Method 


After reviewing probable natal and prenatal in- 
fluences on twin development, Price (1950) was will- 
ing to conclude, 


In all probability the net effect of most twin 
studies has been underestimation of the significance 
of heredity in the medical and behavior sciences 
[p. 293]. 


This appears to be the result of biases of two sorts. 
Inferences drawn from data on twins are subject to 
both statistical and biological biases. In the first 
category, it is basic to the kinds of analyses discussed 
above that the samples be proportional and therefore 
representative of the population of MZ and like- 
sexed DZ twins before the concordance or the vari- 
ance used in the formulas can be assumed to be valid 
enough to support the inferences. This assumption is 
difficult to meet. Use of the twin method also as- 
sumes that the within-pair environmental variance 
is the same for the two types of twins. This is not 
necessarily true for the personality traits as measured 
by the tests, but one can proceed only on the as- 
sumption that such variance is not too different for 
the two types of twins. Loevinger (1943) mentioned 
some additional difficulties underlying the use of 
the variance method, chief among which were the 
assumption that influences combine additively and 
the assumption that estimates of the error vari- 
ance are eliminated from the computation of H. Again 
the extent to which these conditions are met is hard 
to assess. Cattell (1953) concluded that approxima- 
tions of a solution to the nature-nurture issue, with 
an awareness of methodological shortcomings, were 
better than postponing all research in the area. The 
author is inclined to agree. 

Biological biases have been reviewed by Price 
(1950) who divided them into natal factors (e.g. 
Position in utero), lateral inversions, and effects of 
mutual circulation. No attempt to evaluate these fac- 
tors will be made since data are not available. Post- 
natal biological biases are often overlooked on the 
apparent assumption that the general environment 
for a pair of twins is the same. Once more the as- 
sumption is questionable. Should one of a pair, for 


HERITABILITY OF PERSONALITY 9 


example, contract some form of encephalitis with 
its well-known sequelae, the results on personality 
measurement would be obvious. 

The main limitations of twin studies were viewed 
jn somewhat different terms by Kallmann and Baroff 
(1955) as the following: 


(a) twins cannot be separated before they are 
born, nor can they be provided with two mothers 
of different age, personality, or health status; 
(b) two-egg twins are no more dissimilar geno- 
typically than brothers and sisters and like them, 
are rarely raised in different cultures; therefore, 
even fraternal twins are unlikely to fall into the 
extremes of theoretically possible genetic and cul- 
tural differences; and (c) the average difference 
between one-egg twin partners is no precise meas- 
ure of environmentally produced variation, nor 
does an increase over the average difference be- 
tween two-egg twins represent the exact contribu- 
tion of genetic influences even in relatively com- 
parable environments [p. 303]. 


Intraclass Correlation Analysis of 
Personality Traits 


Following the diagnosis of zygosity and the collec- 
tion of the personality test data, each scale of the 
two tests and the IQ from the school records were 
analyzed by means of the intraclass correlation co- 
efficient, first for the two classes of twins and then 
for the two sexes within each class, A total of 186 
coefficients was obtained from 186 simple one-way 
analyses of variance using T scores for the MMPI, 
raw scores for the HSPQ, and Otis IQs. 

Haggard’s (1958) book on the intraclass correla- 
tion gives a detailed exposition of the method used 
here. Although the intraclass correlation was for- 
merly computed by calculating the interclass correla- 
tion after constructing a symmetrical table with 
double entries for a pair of scores and then dividing 
by 2, it now is recognized as a simple function of 
variances. Haggard (1958, p. 11) gives this formula 
for the computation: 


p — BOMS— WMS 
= BCMS-+WMS 


where, 
BCMS = between-classes (twin pairs) mean 
square 
WMS = within-classes (twin pairs) mean 
square 


This means that the unbiased estimate of R may be 
obtained in terms of the mean squares (i.e., variance 
estimates) of the analysis of variance table. This 
formula is the specific one to use for pairs of scores. 
The relationship of F, the variance ratio, to R is 
given by: 

TIER 

~1—-R 
The level of statistical significance of R is identical 
with that of the corresponding F (ie, BCMS/ 
WMS). In other words, the hypothesis that an ob- 


served R could have come from a population with 
a true correlation of zero can be tested by the F 
ratio computed from the same mean squares, with 
the appropriate degrees of freedom, as were used to 
obtain R. 

In order to test the significance of the differences 
between two independently obtained Rs, they were 
converted into Fisher’s z (Fisher & Yates, 1949) 
which has an approximately normal distribution with 
variance: 

k 

T 2(c—2)(k—1) 
where, 

k is the number of individuals within a class, i.e., 

2 (MZ or DZ twins), and c is the number of 

classes, i.e., 34 (pairs). 
The distribution of the difference between the cor- 
responding z values is approximately normal with 
variance: 
Va kı í ka 

= 2(G—2) (k1) ` 2(ca—2) (ka—1) 


Dividing the difference between z2’s by the square 
root of the above gives a normal deviate, the p value 
of which is found in the usual manner. A one-tailed 
test of significance was appropriate and was used. 

Recapitulating, the objectives of this intraclass 
correlation analysis of traits are (a) to demonstrate 
that the traits are significantly and positively cor- 
related in MZ twins and may or may not be in DZ 
twins and (b) to demonstrate that for any geneti- 
cally influenced trait the correlation within MZ 
pairs will be significantly greater than that within 
DZ pairs. 

Subsequent to this analysis, the heritability indexes 
were computed as described in the previous section 
using the independently obtained WMS or within 
variances, It should be noted that the two proce- 
dures, intraclass correlation analysis and computation 
of heritability indexes, involve simple and complex 
assumptions, respectively. The correlation analysis 
is sufficient to show that heredity has something to 
do with individual differences. Estimates of the rela- 
tive importance of nature and nurture as indicated 
by H must be considered as suggestive rather than 
definitive in human behavior genetics. 


Configural (Holistic) Analyses of 
Personality Similarity 


Following a scale-by-scale analysis, one of the two 
personality tests was selected for holistic profile 
analyses. The MMPI was chosen because MMPI 
configurations have been treated extensively in the 
literature. Recent emphasis on the study of profiles 
has resulted from the realization that interpretation 
of an individual’s set of scores must frequently be 
based on the pattern of scores rather than examina- 
tion of one scale at a time or the use of a linear sum 
of the scale deviations. General and specific meth- 
odological difficulties arise which weaken any con- 
fidence that may be attached to the quantification 
of profile similarity. Only a few of the difficulties 


10 Irvine I. GOTTESMAN 


noted by students of the problem (Cronbach & 
Gleser, 1953; Osgood & Suci, 1952) will be discussed. 

Similarity as a general quality of personality is 
nebulous but necessary for communication. Cron- 
bach and Gleser (1953) say: 


similarity is not a general quality. It is possible 
to discuss similarity only with respect to specified 
dimensions (or complex characteristics). This 
means that the investigator who finds that people 
are similar in some set of scores cannot assume that 
they are similar in general. He could begin to 
discuss general similarity only if his original meas- 
urement covered all or a large proportion of the 
significant dimensions of personality [p. 457]. 


Other general methodological difficulties involve the 
loss of information by reducing the relationship be- 
tween two configurations to a single index; lack of 
comparability between indexes of similarity; and 
violations of assumptions about ratio scales, uncor- 
related measures, and equal reliability among subtests. 

There are two aspects of profiles which matching 
may involve; the shape or configuration of scores 
and the general elevation from the mean of the 
norm group, It is logical to distinguish between 
matching for absolute agreement, in which both 
shape and elevation are considered, and relative 
agreement, in which only shape is considered. Three 
statistical and one clinical indexes of similarity were 
computed for the two classes of twins. In addition, 
the profile of each twin was coded according to the 
methods of both Hathaway and Welsh (Dahlstrom 
& Welsh, 1960) to facilitate further clinical assess- 
ment (Gottesman, 1960). 


Statistical Indexes 


Rank-Difference Correlation. This well-known 
measure, Spearman’s rho, was the first index com- 
puted, It yielded a nonarbitrary number which re- 
flected similarity of shape but disregarded elevation. 
One of its disadvantages was that a rho of 1.00 did 
not necessarily indicate perfect similarity and another 
was that two pairs of profiles with the same coeffi- 
cient need not be equally similar, Rho’s were calcu- 
lated from the Welsh codes; ties were resolved by 
using the scales in numerical order. 

D Coefficient. This index, sometimes known as 
the generalized distance function, was then computed. 
Cronbach and Gleser (1953) devote considerable 
attention to this index which is designed to reflect 
both shape and elevation, The D coefficient is based 
on the geometric principle that in a space of N 
mutually orthogonal dimensions, the distance be- 
tween two points is equal to the square root of the 
sum of the squared differences between the coor- 
dinates of the points on each dimension. Since pro- 
files may be considered as points in N space, where 
N equals the number of scales (i.e., 10), the distance 
between them serves as a measure of similarity. Note 
that orthogonality does not obtain for the MMPI. 
The D coefficient results in an arbitrary number 
whose value depends on the number of scales. 


Concordance of Test Behavior (TT’). In the con- 
text of discovery it was decided to compute the 
absolute percentage of MMPI items answered in the 
same direction by a pair of twins, i.e., one twin’s 
answer sheet was used to score the other’s. Of course 
the MMPI was not designed to be used this way and 
in this instance serves primarily as an item pool. 
The percentage of agreement for the 566 items has 
been termed TT’, to signify the comparison of one 
twin with his sibling. No provision was made for 
the few items which are repeated, but any question 
omitted by either twin was subtracted from 566 
before the percentage was calculated. This process 
was then repeated using only those items appearing 
on the 10 clinical scales (337 items) .® 


Clinical Index 


Visual Judgment. The only quantifiable clinical 
index of similarity used was the accuracy of visual 
judgment in sorting the profiles into four categories: 
Very Similar, Similar, Dissimilar, and Very Dis- 
similar. By accuracy was meant the number of MZ 
pair profiles placed in the first two categories and 
the number of DZ in the last two. Three psycholo- 
gists? skilled in the use of the MMPI were the 
judges. Another indication of similarity was pro- 
vided by comparison of the accuracy of visual judg- 
ments in the extreme categories with the overall 
accuracy. 

Recapitulating, the objective of each of the above 
four procedures was to search for a greater similarity 
of personality, as measured by the configural aspects 
of the MMPI, for the MZ twins than for the DZ 
twins, 


RESULTS 


The presentation of the findings is or- 
ganized around the two instruments of 
assessment; first the MMPI with its 10 
clinical scales, 6 experimental scales, and K 
(a validity scale thought to have personality 
referents); and second the HSPQ with its 
14 factored scales. Indexes of H are pre- 
sented separately after each correlational 
analysis. Results are displayed for the total 
sample of 34 pairs each of MZ and DZ twins, 
then for the female subsample (22 MZ and 23 
DZ pairs) and then for the male subsample 
(12 MZ and 11 DZ pairs). The results by sex 
are provocative but any extensive discussion of 


€ Hathaway’s linear statistic CC’ was also calcu- 
lated with no improvement over any of the statis- 
tical methods reported here. Data are available upon 
request. 

7 Thanks are due Jan Duker, H. Gilberstadt, and 
R. D. Wirt. 


HERITABILITY OF PERSONALITY 11 


TABLE 5 


MMPI INTRACLASS CORRELATIONS FOR MZ anp DZ 
Twins FOR TOTAL Group, FEMALES, AND MALES 


Total 


Groups’ R Females’ R 


Males’ R 


Scales MZ DZ MZ DZ MZ DZ 


1 Hs Gian) krii 14 19 44 25 
2D 47 PESTO? 44% 25 48* —19 
3 Hy 474i 
4Pd STERLE 45* 25 
5 Mf 524432% 52A TiS 55* 28 
6 Pa 44** 18 44* 31 43 —10 
7Pt 55#** | 20 
8 Sc 59**k 19 so** 12 6576119 
9Ma 24 —07 29 —28 11 42 
0 Si 55%**_ 08 37s 18 73** —04 


K 32* —02 30 03 40 —26 
Es 25 47** = 17 49%  48* 30 
A 45** = 04 46* 12 42 —35 
R 29* .22 20 15 50* 42 
Do 46** 21 22 40* 72** — 44 
Dy S20E25. some  47** 28. —37 
St AINA 53¥** i28 63*** 72**. 24 


them is vitiated by the size of the samples. 
This section concludes with the results of the 
configural and holistic MMPI profile analyses. 
Thorough discussion of the results is deferred 
to the next section. 


Minnesota Multiphasic Personality Inventory 


The excellent matching of the two classes 
of twins and their representativeness of ado- 
lescents in general were observed on the mean 
scores and standard deviations of the 3 va- 
lidity scales and the 10 clinical scales. Intra- 
class correlation coefficients for the twins are 
given in Table 5. Nine of the 10 standard 
MMPI scales were significantly different from 
zero at the .01 level for the MZ twins. Fifteen 
of the 17 MMPI MZ scale correlations were 
larger than the corresponding correlations for 
DZ twins. It should be noted that for 6 of the 
8 total MZ scale correlations for which test- 
retest reliability data on adolescents were 
available, the magnitudes are about the same. 

Rozeboom (1960) has reminded us that the 
primary aim of a scientifc experiment is not, to 
precipitate decisions, but to make an appropriate 
adjustment in the degree to which one accepts, or 


‘believes, the hypothesis or hypotheses being tested 


[p. 420]. 


Since the traditional null hypothesis tests in 
psychological research pay no attention to the 
utilities of various outcomes, he suggested that 
the basic statistical report should be in the 
form of a confidence interval whenever possi- 
ble. It is obvious, in the case of correlation 
coefficients, for example, that the researcher 
is concerned with more than the fact that a 
particular coefficient is not zero—he hopes he 
is in a position to account for more than a 
trivial amount of variation. In sympathy 
with the Rozeboom position, a few of the 90% 
confidence limits for the data in Table 5 will 
be mentioned. For the largest nonzero R in 
the total sample of MZ twins, .59 for Schizo- 
phrenia (Sc), the limits are .35 and .74; for 
the smallest nonzero R, .39 for Hs, the limits 
are .13 and .61. For the corresponding Rs, 
in the sample of DZ twins, .19 for Sc, the 
limits are —.10 and .44; for Hs, .21, the limits 
are —.08 and .46. 

The results of testing whether or not the 
correlation between MZ pairs is significantly 
greater than that between DZ are presented 
in Table 6. All Rs were first converted to 


TABLE 6 


ONE-TAILED Test OF THE DIFFERENCE BETWEEN MZ 
AND DZ MMPI Scare INTRACLASS CORRELATIONS 
FOR Torat Group, FEMALES, AND MALES 


Total group Females Males 
Normal Normal Normal $ 
Scale deviate deviate b deviate 
1 Hs 79 21 —.16 47 32 
2D 1.76 04 69 24 1.56 06 
3 Hy 30 38 —.59 1.01 16 


Do 1.14 13 —.64 3.00 .001 
Dy 1.28 .10 54 30 1.47 07 
St — 32 —145 144 08 


12 Irvinc I. GoTTESMAN 


Fisher’s z’s. The MZ twins appeared to be 
significantly higher than DZ on 5 of the 10 
standard MMPI scales (p less than or equal 
to the 5% level). The results from the corre- 
lational analyses then left 5 of the 10 standard 
scales, 2—Depression (D), 4—Psychopathic 
deviate (Pd), 7—Psychasthenia (Pt), 8— 
Schizophrenia, and O—Social Introversion, 
which appeared to have significant genetic 
(ie., gene determined) components for the 
combined group. 

The heritability estimates for the MMPI 
scales, which only utilize within-pair vari- 
ances, are presented in Table 7. While there 
is no method yet for the computation of con- 
fidence limits for H, the suggested way for 
testing its significance is a function of the 
significance of the F ratio formed by the 
within-DZ pair variance divided by the 
within-MZ pair variance. Even when the 
computation of H in the present study showed 
that 42% of the observed within-family vari- 
ance for the total group could be accounted 
for by genetic factors, the associated F was 
not statistically significant at the .05 level. 
The infrequent reporting of the significance 
of H in the literature together with the paucity 


TABLE 7 


MMPI Scare HERITABILITY INDEXES ror TOTAL 
Group, FEMALES, AND MALES 


Total group Females Males 


1Hs 16 119 KAJEN Ei 01 1,01 
2D 45 181 22 1.28 65 283 
3 Hy 00 86 00 56 43 1.74 
4 Pd 50 2.01 37 160 17 435 
5 Mf TIS IS -09 99 45 1.83 
6 Pa 05 1.05 .00 70 52 2.09 
7 Pt 37 1.58 47 1.89 24 131 
8 Sc 42 171 36 1.56 50 200 
9 Ma 24 132 33 1.50 .00 81 
0 Si 71 342 60 2.49 84 614 

K 06 1.06 -00 95 26 135 

Es .00 163 -00 -69 00 84 

A 21I 1,26 22 1.28 Fh ey 

R .00 82 -00 -79 00 86 
Do 00 95 .00 -76 33 1.50 
Dy 24 132 -203 1.03 45 182 
St 34 1.52 14 116 63 2.69 


«The three values of F required for significance at the .05 
level are 1.78, 2.04, and 2.72. = ene 


of positive results for even intellectual and 
psychomotor tasks (Vandenberg, 1962) sug- 
gests a need for mathematical clarification of 
H. Within the limits of the assumptions for 
this kind of analysis, this attempt at quantifi- 
cation of the proportion of scale variance ac- 
counted for by heredity gave positive results 
for the same five scales identified by the corre- 
lational analysis. Scales Pt and Sc showed 
appreciable variance accounted for by heredity 
but with environment predominating. Scales 
D and Pd showed about equal contributions 
of heredity and environment. Scale 0, Social 
Introversion, showed a predominance of vari- 
ance (.71) accounted for by heredity. The 
value of H for the Si scale is of the same mag- 
nitude as that found in this study and others 
for intelligence as measured by standard IQ 
tests. 


Cattell’s Factored Test 


Once again the excellent matching of the 
two classes of twins and their representative- 
ness of adolescents in general may be inferred 
from the comparison of mean scores on the 
14 scales with the normative data. Intraclass 
correlation coefficients for the MZ and DZ 
twins are given in Table 8. Eight of the 14 
scales had correlations between MZ twins 
greater than zero. Two scales, A and J, were 
not significantly different from zero for both 
MZ and DZ twins. 

Six of the 14 factors resulted in correlation 
coefficients which were not significantly dif- 
ferent from zero for the MZ twins. That DZ 
should obtain significant correlations on 4 of 
these 6 was paradoxical. It is difficult to rec- 
oncile claims of construct validity for these 
6 scales with these results. Unless there were 
some logical a priori grounds for identical | 
twins to be opposed on some trait, their iden- 
tical heredity and/or their very similar en- 
vironment would lead us to expect other than 
a zero correlation between them for a per- 
sonality trait. The factor derivation of all the 
scales and their low intercorrelations permit- 
ted acceptance and interpretation of the re- 
maining eight scales on their own merit. Fac- 
tors B, F, G, H, I, O, Q, and Q; at this point 
in the analysis have the potential for showing 
a predominance of genetic variance. 

For the largest nonzero R in the total sam- 
ple of MZ twins (other than the intelligence 


SE 


HERITABILITY OF PERSONALITY 13 


factor), .60 for Qo, the 90% confidence limits 
are .38 and .75; for the smallest nonzero R, 
.30 for Qs, the limits are .02 and .54. For the 
corresponding Rs in the sample of DZ twins, 
.15 for Q, the limits are —.13 and .42; for 
Qs, .12, the limits are —.16 and .39. 

In Table 9 are presented the results of test- 
ing whether or not the correlation between 
MZ pairs is significantly greater than that 
between DZ. All Rs were first converted to 
Fisher’s 2’s. 

The MZ twins were significantly higher 
than DZ on only one HSPQ factor, Q». For 
8 of the 14 factors the differences were in the 
predicted direction. The results from both 
the correlation analyses then left only 1 fac- 
tor, Q.—Group Dependency versus Self-Suffi- 
ciency, which appeared to have significant 
genetic (i.e., gene determined) components. 

The heritability indexes for the HSPQ 
scales, computed only from the within-pair 
variances, are presented in Table 10. Within 
the limits of the assumptions for this analysis, 
this attempt at quantification of the propor- 
tion of scale variance accounted for by he- 
redity gives positive results for 6 of the 14 
factors. Factors E, Submissiveness versus 
Dominance; H, Shy, Sensitive versus Ad- 
venturous; and J, Liking Group Action versus 
Fastidiously Individualistic showed apprecia- 


TABLE 8 


HSPQ INTRACLASS CORRELATIONS FOR MZ AND DZ 
Twins ror Totat Group, FEMALES, AND MALES 


Total 


groups’ R Females’ R Males’ R 


Factor MZ DZ MZ DZ MZ DZ 


Py WAS. 2k 26. 40%. —10 05 
B 60** 61*** 65*** 66*** 56* 57* 
ree es Ske 74 0529 92 33 76** 
Dp 2 ar 23 OF 19-23 
E16 41% —06 O 33 —18 
Bis) 47** 5 12 29 16 64** 04 
G 49+  42** 23 33 76*** 56* 
H  38* 20 42* 34 34. = 15 
I sgt 47t 26 00 37 4 
J 26 —04 29 —08 24 —O1 
O 45% 37% sore si** 20 —04 
Q,  60*** 15 54** —02 six 42 
Os) 2) 308) 12 seve 34 —O1 —22 
O SA so 16 12. 73** 


TABLE 9 


Owe-Tatep TEST OF THE DIFFERENCE BETWEEN MZ 
AND DZ HSPQ Scare INTRACLASS CORRELATIONS 
For TOTAL Group, FEMALES, AND MALES 


Total group Females Males 
Normal Normal Normal 
Factor deviate p deviate p deviate p 
A 34 50 32 
B — .03 —.05 —.02 
Cc — 46 — 23 —1,45 
D —1.21 —1.56 —.07 
E —1.10 —2.07 1.13 5K 
F 1.56 06 44 33 1.56 .06 
G 36 36 — 33 Bid aad 
H 78 +22 33 37 1.09 14 
I 40 34 86 19 54 29 
J 1.20 11 1.23 abl 56 29 
oO 40 34 —.04 53 30 
Qe 2.13 02 2.00 02 24 41 
Qs 14 23 88 19 46 32 
Qa —.25 63 26 —1.78 


ble variance accounted for by heredity but 
with environment predominating. Factors F, 
Sober, Serious versus Happy-Go-Lucky; Qz; 
and O, Confident Adequacy versus Guilt 
Proneness showed about equal contributions 
of hereditary and environmental variance 
(.56, .56, and .46). 


Results of the Otis IQ analysis 


The results of the school administered in- 
telligence test are given at this point because 
Factor B of the HSPQ is a brief 20-item 
measure of intelligence. Intraclass correlations 
for the MZ and DZ twins were .83 and .59, 
respectively, both significant at the .001 
level with the first significantly greater than 
the second at the .02 level. The H value com- 
puted from the Otis within variances was .62. 
This means that 62% of the within-family 
intelligence variance measured by the Otis is 
accounted for by hereditary factors in this 
sample. 


Configural and Holistic Analyses 


Rank-Difference Correlations for the Coded 
MMPI Profiles 


Table 11 shows the distribution of the 
Spearman rho’s for the 68 pairs of profiles by 
twin type. It is obvious that the overlap is 
too great to permit other than chance discrimi- 


14 Tryine I, GOTTESMAN 


TABLE 10 


HSPQ Factor HERITABILITY INDEXES For TOTAL 
Group, FEMALES, AND MALES 


Total group Females Males 


Factor H Fa H Fa HH Fa 


A 10 111 00 97 24 132 
B 05 105 00 96 18 1.22 
Cc 03 103 25" 134 .00 43 
D 00 62 .00 37 42 172 
E 31 144 00 80 14 3.84 
F 56 2.29 45 181 74 3.83 
G 00 97 01 101 00 79 
H 38 162 42 173 34 132 
I 06 1.07 05 1.05 10 LAL 
ay 29 141 27 1.37 34 151 
(0) 46 185 22 ~ 1.29 69 3.18 
Q 


^ The three values of F required for significance at the .05 
level are 1.78, 2.04, and 2.72, 


nation (x?=.47, p > .25) between the two 
kinds of twins on the basis of their MMPI 
profile rank-difference correlation coefficients. 
A cutting score for a rho of .40 and above 
would correctly classify 53% of all the twin 
pairs. 


D Coefficient 


Table 12 shows the distribution of the gen- 
eralized distance function computed from the 
68 pairs of profiles, This index abstracts in- 
formation about the shape and elevation of 
the profile. There was a tendency for the iden- 
tical twins to have a lower D, i.e., be less dis- 
similar; 19 of the MZ pairs were below the 


TABLE 12 
DISTRIBUTION or MZ anp DZ MMPI Prorne D 
COEFFICIENTS 
D MZ DZ 

15-19 1 0 
20-24 4 3 
25-29 10 8 
30-34 8 5 
35-39 5 4 
40-44 3 4 
45-49 1 2 
50-54 1 3 
55-59 0 3 
60-64 1 0 
65-69 0 1 
70-74 0 1 


median of the combined group as contrasted 
with 15 of the DZ pairs (y?=.94, p < .17). 
Using the median D as a cutting score resulted 
in correct classification of 56% of the profiles. 


Concordance of Test Behavior (TT’) 


The percentage of all MMPI test items 
(566) answered in the same direction for a 
pair of twins is given in Table 13. There was 
a tendency for the identical twins to have a 
greater overlap in their responses to all the 
items; 21 of the MZ pairs were above the 
median of the combined group as contrasted 
with 15 of the DZ pairs (x?=2.35, p < .06). 
Using the median TT’ of 72% as a cutting 
score resulted in correct classification of 59% 
of the profiles. There was some improvement 
in the use of this index when only the items 
in the 10 clinical scales were used as the de- 
nominator (337); 21 of the MZ pairs ex- 


TABLE 11 
DISTRIBUTION or pae ech ne MMPI PROFILE TABLE 13 
Distripution or MZ anb DZ Torat MMPI Irem 
Rho MZ DZ AGREEMENT PERCENTAGES (TT') 
.80-.89 2 1 TT MZ DZ 
-70-.79 2 1 
.60-.69 3 3 85-89 1 1 
50-59 6 6 80-84 6 4 
40-.49 4 4 75-79 4 3 
30-39 3 2 70-74 13 11 
.20-.29 3 2 65-69 7 10 
-10-.19 3 5 60-64 2 2 
0-09 2 1 55-59 0 2 
—.50 to —.01 6 9 50-54 1 1 


HERITABILITY OF PERSONALITY 15 


ceeded the median compared to 14 DZ pairs 
(x?=2.94, p < .05). The cutting score was 
the same and correctly classified 60% of the 
profiles. 


Clinical Judgment of Profile Similarity 


The extent to which the judges’ clinical 
assessment of personality similarity agreed 
with the zygosity of the twin pairs is given in 
Table 14. Computing the combined p levels 
(Mosteller & Bush, 1954, p. 329) for the ac- 
curacy of the three judges on their sorting of 
all profiles resulted in a p equal to .003. The 
comparable figure for accuracy of judging 
the two extreme groups of Very Similar and 
Very Dissimilar was .004. By pooling the 
ratings for each pair (i.e., 2 of 3, or 3 of 3 
votes) in an effort to correct for the various 
sources of attenuation in the “configural pow- 
ers” of the clinician, the accuracy of the total 
sort increased to 67.6% (2=2.90, p=.005). 


Discussion 


In the introduction the purpose of the pres- 
ent research was said to be to answer this 
question—How much of the variability ob- 
served within a group of individuals in a 
specified environment on a specific measure of 
a specific personality trait is attributable to 
genetic factors? Implicit in the posing of this 
question was the assumption, subsequently 
confirmed by the data, that there were meas- 
urable genetic influences for at least some of 
the aspects of human personality tapped by 
the selected personality tests. Since the quan- 
tification of genetic variability derived from 
the computation of heritabilities by the classi- 


TABLE 14 


AGREEMENT OF CLINICAL JupGMENTs oF MMPI 
PROFILE SIMILARITY WITH TWIN ZYGOSITY 


Extreme 
Judge Totalsort z p pilesort z $ 
A 64.7% 243 01 67.6% 2.06 .03 
B 61.8% 1.95 .03 73.5% 2.74 .006 
C 58.8% 145 .08 58.8% 1.03 .16 
Combined p 003 .004 


cal twin method, and since this method rests 
upon some unproved assumptions, the results 
are recognized as suggestive and heuristic 
rather than definitive. While the data language 
for this research consists of scores on scales of 
personality tests, the discussion which follows 
is in terms of the underlying biophysical traits 
which the scales (read constructs) reflect. 

The first section of the discussion deals with 
the specific traits and their configurations 
demonstrated to have been influenced by he- 
reditary factors together with some of the 
implications of these data for personality 
theory and the etiology of mental illness. 
This is followed by an attempt to explain the 
apparent failure of some of the holistic analy- 
ses of personality to support strongly the find- 
ings of the trait analyses. 

To what extent can the results of the present 
study be applied to human behavior in gen- 
eral? The representativeness of the twin 
sample to Minnesota adolescents in general 
suggests that this kind of extrapolation is 
fairly safe. There is the possibility that the 
twins, preponderantly of Scandinavian extrac- 
tion, sample a unique gene pool. Whether 
the further extrapolation to adults can be 
safely made is dubious and difficult to assess. 
When rate of development enters the picture, 
biological influences on a trait might be em- 
phasized compared to a final adult stage. The 
present results could be very different if de- 
rived from adult twins as suggested by studies 
of morphology (Osborne & DeGeorge, 1959). 
Another important question is the extent to 
which these data from normal nonhospitalized 
individuals can be applied to identifiable psy- 
chiatrically ill. A basic assumption throughout 
this project was that the measured aspects of 
personality varied continuously, and that any 
underlying genetic mechanisms were polygenic 
in nature. The possibility that the extremes 
of distributions for some personality char- 
acteristics constitute discrete series exists, but 
this phenomenon would be masked by the 
strategy used. The net effect would be the 
underestimation of the importance of heredity 
since concordances in MZ twins for low inci- 
dence Mendelian or polygenic characteristics 
would pass unnoticed in the analysis of quan- 
titative variability. 


16 Irvine I. GOTTESMAN 


Genetic Aspects of Personality 


A total of 6 traits out of the standard 24 
in the two tests met a criterion which clas- 
sified them as significantly influenced by ge- 
netic factors; that is, correlations between the 
scores of indentical twins were significantly 
higher than those between the scores of fra- 
ternal twins. Beyond this, for 23 of the total 
31 scales analyzed in the two personality tests, 
the differences were in the predicted direc- 
tion. Trait Q», Self-Sufficiency, was the only 
survivor of the 14 HSPQ measures. Although 
Factor Q, is not thought to be clearly estab- 
lished, the item content suggests that a person 
who is resolute and accustomed to making his 
own decisions will obtain a high score, while 
low scorers would be described as followers 
and conformists. A synthesis of this trait and 
F, Surgency (significant at the .06 level), is 
provided by Cattell’s large second-order fac- 
tor, Extraversion versus Introversion, which 
is composed of four factors. Two of the four 
are F and Q, Tying in neatly with the 
MMPI findings which are discussed next is 
a study on the construct validity of the adult 
form of the HSPQ by Karson and Pool (1957). 
These authors found the highest MMPI scale 
correlate of Factor F to be Scale Si, Social 
Introversion, and the highest correlate of Q 
to be Si also with correlations of —.48 and .32, 
respectively. Together such results appear to 
identify a general dimension closely related to 
introversion-extraversion as one which is 
heavily influenced by genetic factors. Such a 
conclusion has also been reached by Eysenck 
(1947, 1956) who isolated introversion-ex- 
traversion as one of the two (now three) 
dimensions of personality by a factor analysis 
of ratings and personal data on 700 neurotic 
soldiers. He considers his findings to repre- 
sent a confirmation of the theoretical ideas of 
Jung (1933). Genetic factors are given a prom- 
inent place in Eysenck’s typology; his twin 
study (Eysenck, 1956) using statistics similar 
to those in the present study, yielded a tenta- 
tive value for H on a factored measure of 
introversion-extraversion of .62. 

The trait of introversion as measured by 
the MMPI may also have implications for one 
of the genetic theories of schizophrenia. 
Patients with very high scores on Scale Si are 


clinically described as “schizoid” and Kall- 
mann (1953) and others have suggested that 
the schizoid individual may represent the ge- 
netic “carrier state” of the recessive schizo- 
phrenic gene. In other words, the schizoid 
individual may represent the heterozygote and 
the schizophrenic may represent the homozy- 
gote. If the schizoid carrier can be identified, 
Kallmann’s hypothesis about recessivity is no 
longer tenable. The mode of inheritance must 
then be that of incomplete dominance or poly- 
genic, thus better accounting for the high 
familial incidence of schizophrenia (cf. Greg- 
ory, 1960). The magnitude of the herit- 
ability for Scale Si was the largest found in 
the present study. The belief in the genetic 
contribution to intelligence has come to have 
a fairly secure status in contemporary psychol- 
ogy (Gottesman, 1963); the results of this 
investigation indicate that a similar status is 
appropriate for the more pure personality trait 
of introversion. 

The results concerning the four remaining 
MMPI scales, D, Pd, Pt, and Sc, lend sup- 
port to the general idea that in human beings 
psychopathology, especially psychosis, has a 
substantial genetic component. The scoring 
keys for these scales were developed for the 
purpose of locating patients with respect to 
psychiatrically diagnosed states of depression, 
psychopathic personality, psychasthenia, and 
schizophrenia. Patients of each type were 
compared, item by item, with a normal group. 
Those items which statistically differentiated 
a diagnostic group from normals were then 
included in the appropriate scale. Taken 
singly, these MMPI scales are able to dis- 
criminate about 60% of the patients corre- 
sponding to their label at a cost of 5% false 
positives among the normals; however, the 
single scale approach has been largely sup- 
planted by the interpretation of the entire 
profile with particular attention to the two 
or three highest scores. The use of the pattern 
types formed by the various combinations of 
high scores has led to their establishment as 
constructs which can be used in lieu of diag- 
nostic classes. Many correlations have been 
observed between other variables and the per- 
sonality construct patterns. For the D-Sc 
pattern, for example, a majority of the diag- 
noses among psychiatric patients obtaining 


HERITABILITY OF PERSONALITY 17 


such a configuration were psychotic ones, 
either depression or schizophrenia. 


Heredity, defined here rather crudely simply as psy- 
chosis in siblings or parents, tended to be unfavor- 
able in these individuals [Hathaway & Meehl, 1956, 
p. 143]. 


Recent research on the D-Pi type (Gilber- 
stadt & Duker, 1960) showed that it could 
be analyzed into three subtypes. Among other 
important findings, the authors observed that 
the D-Pt-Sc type of MMPI profile was char- 
acterized by a diagnosis of chronic undifferen- 
tiated schizophrenia and that such patients 
had much in common with descriptions in the 
literature of pseudoneurotic schizophrenia 
(Peterson, 1954). Surveys of the significance 
of major configural patterns are available in 
Hathaway and Meehl (1951) as well as in 
Dahlstrom and Welsh (1960). 

A discussion of the results of the attempted 
quantification of hereditary influence adds 
some new information but it is not on the 
same firm footing as the correlational results. 
Eleven of the 24 traits measured by the two 
tests showed at least an appreciable genetic 
component. By appreciable is meant one third 
or more of the trait variance accounted for 
relative to the contribution of environmental 
factors (this required an H of .25 or more so 
that H divided by one minus H equalled one 
third). HSPQ factors, F, Q2, and O showed 
about equal roles for heredity and environ- 
ment. Factor O in the Karson and Pool 
(1957) study correlated most highly, .77, 
with MMPI Scale 7 (Psychasthenia) and 
.54 with Scale 0 (Social Introversion). Scale 7 
was also found to have an appreciable genetic 
component in the present study. Three more 
HSPQ factors survived the criterion, E, H, 
and J. H along with F and Q; formed three 
of the four factors in Cattell’s second-order 
factor Extraversion versus Introversion, In 
a study using an early form of the HSPQ 
(Cattell et al., 1955) both E and J were 
found to have appreciable genetic components. 
‘All five of the MMPI scales surviving the 
correlation criterion appeared in the quanti- 
tative analysis as having an appreciable ge- 
netic component. Only Scale Si, as noted 
above, was predominantly genetically de- 
termined. 

Heritability values given in Tables 7 and 


10 as well as those in other twin studies give 
the proportion of variance produced by ge- 
netic differences within families and under- 
estimate the effects of heredity in the general 
population. Zf we could assume random mat- 
ing for the traits measured, the values in 
Tables 7 and 10 would have to be multiplied 
by a factor of approximately two to indicate 
the true degree of genetic determination. The 
values thus obtained are remarkable even 
when compared with the results from animal 
behavior genetics. McClearn (1961) reported 
a value of .69 for activity in strains of mice. 
Fuller and Thompson (1960) reported a 
heritability of .59 for exploratory behavior 
in two strains of mice. In animal breeding 
intense selection pressure permits the value of 
heritability to become large for such charac- 
ters as egg weight and wool length (Falconer, 
1960). An overview of the relative contri- 
butions of heredity and environment to 
within-family variance on the personality 
traits measured leads to the observation that 
environment is the preponderant influence in 
a majority of the traits. Cattell, Stice, and 
Kristy (1957) reached a similar conclusion 
for both within- and between-family variances. 


Sex Differences 


It is difficult to evaluate the meaning to 
be attached to the observed sex differences 
in heritability of the various traits since the 
sample sizes are reduced. It is tempting to 
speculate about the possible evolutionary ori- 
gin of sex differences. Confidence in, and 
extension of, the interpretations which follow 
must await the completion of an ongoing 
study using 180 pairs of twins. To suggest 
that variation in a personality trait is more 
under genetic control in one sex than the other, 
it is only necessary to be reminded of the 
range of secondary sex differences already 
observed in physical and behavioral characters 
in both man and animal. The differentiation 
of such traits is brought about by hormones; 
the latter are one of the links in the gene to 
behavior pathway. 

Changing from an expressive environment 
to a suppressive one will lower the heritability 
of a trait. Suppose, for example, that early 
experience in fighting is essential for inducing 
aggressive behavior. Genetic differences in 


18 Irvine I. GOTTESMAN 


variability on the Pd scale will not be detected 
in boys reared without this experience. The 
process of sex typing in our culture restricts 
certain types of behavior in the two sexes. 
Fighting is not tolerated and is suppressed 
in little girls so that we might expect the 
value of H for females on the Pd scale to be 
less than that for males. The results for 
Factor E, Submissiveness versus Dominance, 
and for Surgency reveal the same pattern of 
greater heritability for males than females. 
This attempt at an environmental explanation 
for the sex difference observed in trait herita- 
bility should not be thought of as appropriate 
for all the patterns observed. 


Factorially Derived Scales versus Empirically 
Derived Scales 


The positive correlational results for the 
1 of 14 HSPQ factors could have been at- 
tributed to chance. In comparison with the 
positive results for 5 of the 10 MMPI scales, 
the harvest from the factorially derived per- 
sonality test looks poor. The validity of 6 of 
the 14 HSPQ scales was cast into doubt by 
finding a zero correlation between identical 
twins. Many psychologists (e.g, Hall & 
Lindzey, 1957) have noted that factors de- 
rived by factor analysis are often not psy- 
chologically meaningful and do not agree with 
reality. Perhaps the empirical derivation of 
the MMPI scales was such as to allow Nature 
to be carved, albeit imperfectly, at the joints. 
The ease with which MMPI scales can be 
factored into subscales may mean that the 
original scales and those derived in similar 
fashion are equivalent to the large second- 
order factors of the factor analysts. The Si 
scale, for example, correlates with 10 of Cat- 
tell’s factors (Karson, 1958). Demonstration 
of behavioral correlates for factors, such as 
Eysenck’s (1950) criterion analysis, could 
result in a formidable merger of the ideas of 
Cattell, Eysenck, Hathaway, and Meehl. 


Fate of Attempts at a Holistic View of 
Personality 


Although all three statistical measures of 
MMPI profile similarity, rho, D, and TT’, 
tended to support the hypothesis of greater 
personality similarity between isogenic indi- 


viduals, only the clinical judgments of simi- 
larity gave substantial support. Inasmuch as 
the statistical method usually surpasses the 
clinical in psychology (Meehl, 1954), these 
findings require close attention. Factors fa- 
voring successful clinical prediction in a profile 
sorting task have been suggested by Meehl 
(1959). Perhaps the most directly relevant 
factor he mentioned was the clinician’s ability 
to analyze a configural relationship existing 
between predictor variables and a criterion, 
when the function is not derivable on rational 
grounds. 

Even casual inspection of the results of the 
clinical judgments of personality similarity are 
conducive to accepting the general hypoth- 
esis of this research—the greater the gene 
similarity, the greater the overall personality 
similarity. The judging task required a dis- 
crimination between sisters, for example, of 
the same age, with the same parents, sharing 
more or less the same environment for their 
entire lives, but who differed in the amount 
of genetic overlap by a factor of only two. 
We might speculate that the task would have 
been easier had the pairs consisted of un- 
related individuals or first cousins of the same 
age, sex, and social class background con- 
trasted with identical twins. Misjudgments 
from the pooled ratings of the judges afford 
some insight to the range of heredity-environ- 
ment interactions. The amount of variability 
available to the same genotypes is shown by 
the fact that 10 of the 34 pairs of identical 
twins’ profiles were classified as dissimilar. 
Conversely, the lack of variability in per- 
sonality available to genotypes with only half 
their genes in common is shown by the fact 
that 12 of the 34 fraternal twins’ profiles were 
classified as similar, 


Summary and Conclusions 


The present study was carried out in the 
context of behavior genetics, the interdis- 
ciplinary science combining the knowledge and 
procedures of modern genetics with those of 
psychology. By means of the classical twin 
method, as it has been recently improved, and 
objective personality tests, the purpose of the 
research was to answer the question, how much 
of the variability observed within a group 


HERITABILITY OF PERSONALITY 19 


of individuals in a specified environment on 
specific measures of specific personality traits 
is attributable to genetic factors? It was 
recognized that the genetic variance, herita- 
bility (H), could vary according to such 
things as age, sex, culture, trait, and method 
of trait assessment. 

Among the key constructs from genetics, 
reaction range and polygenic inheritance are 
central to the methodology used and inter- 
pretation of results. Heredity fixes a reaction 
range; within this framework a genotype de- 
termines an indefinite but circumscribed as- 
sortment of phenotypes, each of which corre- 
sponds to one of the possible environments 
to which the genotype may be exposed. The 
classical Mendelian model of dominant and 
recessive gene inheritance will not handle 
the data on continuous variation, the kind 
observed with human behavior. Polygenic 
systems are posited to account for quantita- 
tive inheritance, the phenotypic effects being 
simply a function of the number of genes 
present. Both the twin method and its limi- 
tations were discussed. 

Thirty-four pairs each of identical (MZ) 
and fraternal (DZ) same-sex adolescent twins 
from the public high schools of Minneapolis, 
St. Paul, and Robbinsdale, Minnesota, served 
as the sample. The entire population of same- 
sex twins among the 31,000 children in the 
schools was first enumerated, Forty-five pairs 
of girls and 23 pairs of boys volunteered which 
represented 75% and 43%, respectively, of 
the total possible twin pairs available in the 
selected sample of schools. Disregarding sex, 
the twins comprised 60% of the possible 113 
pairs. 

At the time of testing, the twins filled out 
a personal history data sheet, the Minnesota 
Multiphasic Personality Inventory (MMPI), 
and Cattell’s High School Personality Ques- 
tionnaire (HSPQ); they were weighed, meas- 
ured for height, fingerprinted, and photo- 
graphed. The diagnosis of zygosity was based 
upon extensive blood grouping with respect 
to nine blood groups. This resulted in 100% 
accuracy in the diagnosis of DZ twins and at 
least 95% for MZ twins. A methodological 
contribution to twin diagnosis was made by 
the comparison of the accuracies of various 
methods and their combinations. Blood typing 


is necessary and sufficient for the accuracy 
required in human behavior genetics research. 

Each scale of the two personality tests and 
the school recorded Otis 1Q were first analyzed 
by means of the intraclass correlation coeffi- 
cient for the two classes of twins and for sex 
differences. Subsequent to this analysis, the 
heritability indexes were computed; H is de- 
fined as the proportion of personality scale 
variance attributable to genetic factors. The 
correlation analysis of the 14 HSPQ scales 
suggested that two factors (F, Sober, Serious 
versus Enthusiastic, Happy-Go-Lucky, and Q», 
Group Dependency versus Self-Sufficiency) 
had significant genetic (i.e., gene determined) 
components. The correlation analysis of the 
10 MMPI scales resulted in five, Scale 2 
(Depression), Scale 4 (Psychopathic Devi- 
ate), Scale 7 (Psychasthenia), Scale 8 (Schizo- 
phrenia), and Scale 0 (Social Introversion) , 
which appeared to have significant genetic 
components. 

Within the limits of the assumptions, the 
attempt at quantification of the proportion 
of scale variance accounted for by heredity 
gave positive results for six of the HSPQ 
factors. Factors E, Submissiveness versus 
Dominance; H, Shy, Sensitive versus Ad- 
venturous; and J, Liking Group Action versus 
Fastidiously Individualistic showed appreci- 
able variance accounted for by heredity but 
with environment predominating. Factors F, 
Qa, and O, Confident Adequacy versus Guilt 
Proneness, showed about equal contri- 
butions of heredity and environment. The 
same kind of analysis of the MMPI gave 
positive results for 5 of the 10 scales. Scales 
7 (Psychasthenia) and 8 (Schizophrenia) 
showed appreciable variance accounted for 
by heredity but with environment predominat- 
ing. Scales 2 (Depression) and 4 (Psycho- 
pathic Deviate) showed about equal contri- 
butions of heredity and environment. Scale 0 
(Social Introversion) showed a predominance 
of variance (H=.71) accounted for by he- 
redity. The value of H for the Otis IQ in 
this study was .62. 

Following the scale-by-scale analysis, three 
holistic statistical analyses and one clinical 
holistic analysis of the MMPI profiles were 
done. The rank-difference correlations (rho) 
for the coded MMPI profiles, the generalized 


20 Trvinc I. GOTTESMAN 


distance function (D), and a measure of test 
item verbal behavior concordance (TT) all 
showed a tendency for the MZ profile pairs 
to be more similar. The tendency was not 
strong enough to discriminate between the 
two classes of twins on the basis of any of 
the three measures, Clinical judgments of 
profile similarity by three experts supported 
the general hypothesis that the greater the 
gene similarity, the greater the personality 
similarity. The pooled accuracy of the agree- 
ment of clinical judgments of MMPI profile 
similarity with twin zygosity was 68% (p= 
005). 

Elaboration of the various biochemical or 
structural differences in the gene to behavior 
pathway which correspond to the results re- 
ported here might be intellectually satisfying, 
but progress in personality genetics need not 


await this step. A useful taxonomy of the 
aspects of behavior termed personality can be 
facilitated by the use of relatively invariant 
psychometric configurations to describe be- 
havioral phenotypes. An important by-prod- 
uct of the study of twins and a concern with 
human behavior genetics is the emphasis 
given to the need for a multidisciplinary 
analysis of behavior ranging from biochemistry 
through evolution. Granting that the difficul- 
ties in accurately assessing the contribution 
of heredity to variation in socially important 
behavior are great, such efforts will not have 
been in vain if they contribute to a greater 
understanding of the sources of individual 
differences. The provision of an optimum 
environment for the optimum development 
of the various aspects of human behavior 
should follow such increased understanding. 


REFERENCES 


Auten, G. Comments on the analysis of twin sam- 
ples. Acta genet. med. Gemellolog., 1955, 4, 143- 
160. 

Auten, G. Patterns of discovery in the genetics of 
mental deficiency. Amer. J. ment. Defic., 1958, 62, 
840-849, 

Aten, G. Intellectual potential and heredity. Sci- 
ence, 1961, 133, 378-379. 

Anastast, ANNE. Heredity, environment, and the 
question “How?” Psychol. Rev., 1958, 65, 197- 
208. 

Brack, J. D. The interpretation of MMPI profiles 
of college women. Unpublished doctoral disserta- 
tion, University of Minnesota, 1953. 

Carrett, R. B. The description and measurement of 
personality. New York: World Book, 1946, 

CATTELL, R. B. Personality. New York: McGraw- 
Hill, 1950. 

Carrere, R. B. Research designs in psychological 
genetics with special reference to the multiple vari- 
ance method. Amer. J. hum. Genet., 1953, 5, 76-91. 

CATTELL, R. B., Berorr, H., & Coan, R. W. Hand- 
book for the IPAT High School Personality Ques- 
tionnaire. Champaign, Ill.: Institute of Personality 
Ability Testing, 1958. 

CATTELL, R. B, Brewert, D. B., & BELOFF, J. R. 
The inheritance of personality, Amer. J. hum. 
Genet., 1955, 7, 122-146. 

Carrett, R. B., Srce, G. F., & Kristy, N. F. A 
first approximation to nature-nurture ratios for 
eleven primary personality factors in objective 
tests. J. abnorm. soc. Psychol., 1957, 54, 143-160. 

Cronsaca, L. J. Essentials of psychological testing. 
New York: Harper, 1960. 

Cronsacu, L. J, & GLESER, GOLDINE. Assessing 
similarity between profiles. Psychol. Bull., 1953, 
50, 456-473. 


Cronsaca, L. J., & Mrrnt, P. E. Construct validity 
in psychological tests. Psychol. Bull., 1955, 52, 281- 
302. 

Cummins, H., & Mipro, C. Fingerprints, palms, and 
soles. Philadelphia: Blackiston, 1943, 

DanrstroM, W. G., & WersH, G. S. (Eds.) An 
MMPI handbook. Minneapolis: Univer. Minne- 
sota Press, 1960. 

Doszmansxy, T. Evolution, genetics, and man. New 
York: Wiley, 1955. 

Eysencx, H. J. Dimensions of personality. London: 
Paul Kegan, 1947. 

Eysencx, H. J. Criterion analysis. Psychol. Rev., 
1950, 57, 38-53. 

Eysenck, H. J. The inheritance of extroversion- 
introversion. Acta psychol, Amsterdam, 1956, 12, 
95-110. 

Farconer, D. S. An introduction to quantitative 
genetics. New York: Ronald, 1960. 

Fisner, R. A, & Yates, F. Statistical tables for 
biological, agricultural and medical research. (3rd 
ed.) New York: Hafner, 1949, 

Futter, J. L. The genetic base: Pathways between 
genes and behavioral characteristics. In, The na- 
ture and transmission of the genetic and cultural 
characteristics of human populations. New York: 
Milbank Foundation, 1957. Pp. 101-111. 

Futter, J. L. Behavior genetics. Annu. Rev. Psy- 
chol., 1960, 11, 41-70. 

Futter, J. L, & Tompson, W. R. Behavior ge- 
netics. New York: Wiley, 1960. 

Gatton, F. The history of twins as a criterion of 
the relative powers of nature and nurture. Fraser’s 
Mag., 1875, 12, 566-576. 

Guserstapr, H. G., & Duxer, Jan. Case history 
correlates of three MMPI profile types. J. consult. 
Psychol., 1960, 24, 361-369. 


HERITABILITY OF PERSONALITY 21 


GOTTESMAN, I. I. The psychogenetics of personality. 
Unpublished doctoral dissertation, University of 
Minnesota, 1960. 

GOTTESMAN, I. I. Genetic aspects of intelligent 
behavior. In N. Elis (Ed.), The handbook in 
mental deficiency. New York: McGraw-Hill, 1963, 
Ch. 7. 

Grecory, I. Genetic factors in schizophrenia. Amer. 
J. Psychiat., 1960, 116, 961-972. 

Haccarp, E. A. The intraclass correlation coefficient. 
New York: Dryden, 1958. 

Hatt, C. S., & Luvpzey, G. Theories of personality. 
New York: Wiley, 1957. 

Hatmaway, S. R., & Bricos, P. F. Some normative 
data on new MMPI scales, J. clin. Psychol., 1957, 
13, 364-369. 

Harnaway, S. R., & Meent, P. E. An atlas for the 
clinical use of the MMPI. Minneapolis: Univer. 
Minnesota Press, 1951. 

Haraway, S. R, & Meent, P. E. Psychiatric im- 
plications of code types. In G. S. Welsh & W. G. 
Dahlstrom (Eds.), Basic readings on the MMPI. 
Minneapolis: Univer. Minnesota Press, 1956. Pp. 
136-144. 

Haraway, S. R, & Monacuest, E. D. An atlas 
of juvenile MMPI profiles. Minneapolis: Univer. 
Minnesota Press, 1961. 

Hoxzincer, K. J. The relative effect of nature and 
nurture influences on twin differences. J. educ. 
Psychol., 1929, 20, 241-248. 

Jackson, D. D. A critique of the literature on the 
genetics of schizophrenia. In D. D. Jackson (Ed.), 
The etiology of schizophrenia. New York: Basic 
Books, 1960. Pp. 37-87. 

June, C. G. Psychological types. New York: Har- 
court, 1933. 

Karmann, F. J. The genetic theory of schizo- 
phrenia. Amer. J. Psychiat., 1946, 103, 309-322. 
Karrmann, F. J. Heredity in health and mental 

disorder. New York: Norton, 1953. 

Katrmann, F. J. Psychogenetic studies of twins. In 
S. Koch (Ed.), Psychology: A study of a science. 
Vol. 3. Formulations of the person and the social 
context. New York: McGraw-Hill, 1959. Pp. 328- 
362. 

Katrmann, F. J., & Barorr, G. S. Abnormalities of 
behavior (in the light of psychogenetic studies). 
Annu. Rev. Psychol., 1955, 6, 297-326. 

Karson, S. Second order personality factors and the 
MMPI. J. clin. Psychol., 1958, 14, 313-315. 

Karson, S., & Poot, K. B. The construct validity 
of the Sixteen Personality Factors Test. J. clin. 
Psychol., 1957, 13, 245-252. 

LEJEUNE, J., & TURPIN, R. Chromosomal aberra- 
tions in man. Amer. J. hum. Genet., 1961, 13, 
175-184. 

LOEVINGER, Jane. On the proportional contribution 
of differences in nature and in nurture to differences 
in intelligence. Psychol. Bull., 1943, 40, 725-756. 


LOEvINGER, JANE. Objective tests as instruments of 
pocobe theory. Psychol. Rep., 1957, 3, 635- 

McCreary, G. E. Genotype and mouse activity. 
J. comp. physiol. Psychol., 1961, 54, 674-676. 

Meerut, P. E. Clinical versus statistical prediction. 
Minneapolis: Univer. Minnesota Press, 1954. 

Meerut, P. E. A comparison of clinicians with five 
statistical methods of identifying psychotic MMPI 
profiles. J. counsel. Psychol, 6, 1959, 102-109. 

Meerut, P. E., & Rosen, A. Antecedent probability 
and the efficiency of psychometric signs, patterns, 
or cutting scores. Psychol. Bull., 1955, 52, 194- 
216. 

MosrteLLER, F., & Buss, R. R. Selected quantitative 
techniques. In G. Lindzey (Ed.), Handbook of 
social psychology. Boston: Addison-Wesley, 1954. 
Pp. 289-334. 

Newman, H, H. Multiple human births. New York: 
Doubleday, 1940. 

Newman, H. H., Freeman, F, N., & HOLZINGER, 
K. J. Twins: A study of heredity and environ- 
ment. Chicago: Univer. Chicago Press, 1937. 

Osporne, R. H., & DeGeorce, F. V. Genetic basis 
of morphological variation. Cambridge: Harvard 
Univer. Press, 1959. 

Oscoop, C. E., & Suci, G. J. A measure of relation 
determined by both mean difference and profile 
information. Psychol. Bull., 1952, 49, 251-262. 

Pererson, D. R. The diagnosis of subclinical schizo- 
phrenia. J. consult. Psychol., 1954, 18, 198-200. 

Price, B. Primary biases in twin studies. Amer. J. 
hum. Genet., 1950, 2, 293-352. 

Race, R. R., & Sancer, R. Blood groups in man, 
(3rd. ed.) Springfield: Charles C Thomas, 1958. 

Rozesoom, W. W. The fallacy of the null-hypothe- 
sis significance test. Psychol, Bull., 1960, 57, 416- 
428. 

Starter, E. Psychotic and neurotic illnesses in twins. 
London: Her Majesty’s Stationery Office, 1953. 
Smırm, SHELIA M., & PENROSE, L. S. Monozygotic 
and dizygotic twin diagnosis. Ann. hum, Genet., 

1955, 19, 273-289. 

Srern, C. Principles of human genetics, (2nd ed.) 
San Francisco: Freeman, 1960. 

SrranpsKov, H. H., & Epexen, E. W. Monozygotic 
and dizygotic birth frequencies in the total, in the 
“white” and the “colored” U. S. populations, 
Genetics, 1946, 31, 438-446. 

Tryon, R. C. Behavior genetics in social psychology. 
Amer, Psychologist, 1957, 12, 453. (Abstract) 

VANDENBERG, S. G. The hereditary abilities study: 
Hereditary components in a psychological test 
battery. Amer. J. hum. Genet., 1962, 14, 220-237. 

Wirt, R. D., & Briccs, P. F. Personality and en- 
vironmental factors in the development of delin- 
quency. Psychol. Monogr., 1959, 73 (15, Whole No. 
485). 

(Received February 7, 1963) 


s e 
, 
+ Ae 
CONTENTS 


Brain Injury in the Preschool Child: Some Developmental Considerations: 


I. Performance of Normal Children..... Heat an Pa ee Gaeta Sa Hite 1 
Frances K. GRAHAM, CLAIRE B. ERNHART, MARGUERITE CRAFT, 
AND PayLLIs W. BERMAN 
Brain Injury in the Preschool Child: Some Developmental Considerations: 
17 


II. Comparison of Brain Injured and Normal Children... nsss. 


Crame B. Ernwart, Frances K. GRAHAM, Peter L. EICHMAN, 


Joan M. MARSHALL, AND DON THURSTON 


Vol. 77, No. 10 


Whole No. 573, 1963 


Psychological Monographs: General and Applied 
BRAIN INJURY IN THE PRESCHOOL CHILD: SOME 
DEVELOPMENTAL CONSIDERATIONS: 


I. PERFORMANCE OF NORMAL CHILDREN * 


m i 
FRANCES K. GRAHAM 
University of Wisconsin 
CLAIRE B. ERNHART, MARGUERITE CRAFT 
Washington University 


AND 


PHYLLIS W. BERMAN 
University of Wisconsin 


Procedures were developed to measure vocabulary skill, conceptual ability, per- 
ceptual-motor ability, and personality characteristics of preschool age children. 
Particular procedures were selected either because they had successfully differ- 
entiated brain injured from normal adults or because they measured functions 
relevant to theoretical questions concerning the brain injured child. The effects 
of age, sex, status group, and their interactions were determined in a balanced 
sample of 108 normal children and statistical adjustments for significant effects 


were cross validated with an additional 137 children. 


Homogeneity of variances, 


linearity of regressions with age, and normality of distributions were considered 


in devising and standardizing scores. 
were also obtained. 


Reliability measures and intercorrelations 


Cine of brain injured adults and of 
brain injured, school age children have 
indicated that behavior tests may be useful 
both in diagnostic evaluation of individual pa- 
tients and in research attempting to relate 
functions and neurophysiological or neuro- 
anatomical variables. Preschool brain injured 
children have seldom been studied, however, 
although accidents and diseases capable of 
injuring the brain are probably more frequent 
in this age group than in any other. We do 
not know whether the consequences of injury 
to the young brain are more or less devastat- 
ing than a comparable injury to the mature 
brain. We know little of the course of re- 
covery from injury or of the pattern of defects 


1 This work was supported by Research Grant 
B-1550 from the National Institute of Neurological 
Diseases and Blindness of the National Institutes of 
Health, United States Public Health Service. The 
study was made possible by the generous cooperation 
of the Madison Department of Public Health; the 
Madison Neighborhood Centers; the Madison Salva- 
tion Army Nursery School; the St. Louis Maternity 
Hospital; and by the many jndividual physicians, 
teachers, and parents who contributed their help and 
time. 


that may be related to age at the time of in- 
jury, to the time elapsed since injury, or to 
the age at which a child is examined. These 
are problems of theoretical as well as of prac- 
tical interest and they have stirred speculation 
but little empirical investigation. 

The present study was undertaken as a 
necessary first step toward such investigation. 
Its purpose was to develop standardized meth- 
ods of measuring certain kinds of behavior 
which clinical reports about children and re- 
search with adults have indicated may be 
relevant in studying brain injury, An attempt 
was made to sample functions thought to be 
especially impervious to impairment as well 
as those thought to be especially vulnerable. 

In devising test methods the procedure 
wherever possible was to adapt to the pre- 
school level tests that had been used suc- 
cessfully with older groups to differentiate 
between the brain injured and the non-brain- 
injured. Selection and development of pro- 
cedures were also influenced by the decision 
to limit testing to a period of less than 1 hour, 
to use simple portable test materials, and to 
use procedures likely to appeal to this age 


2 GRAHAM, ERNHART, CRAFT, AND BERMAN 


group. The procedures selected included meas- 
ures of vocabulary skill, conceptual ability, 
perceptual-motor ability, and personality 
characteristics. 

A vocabulary measure was obtained both 
to provide a possible measure of the general 
level of functioning, which could be “held con- 
stant,” and because it is a dependent variable 
of special interest. In brain injured adults 
vocabulary is relatively insensitive to impair- 
ment following brain injury and may be used 
to estimate the prior level of functioning. 
Whether, with injury to the young brain, vo- 
cabulary skill also remains relatively unaf- 
fected is not known, but the question is an 
important one. Hebb (1949) has suggested, 
on theoretical grounds, that vocabulary ability 
would be more affected by an injury occurring 
early in life than following damage to the 
adult brain (pp. 289-294). 

In contrast to vocabulary skill, conceptual 
and perceptual-motor abilities are likely to be 
impaired by brain injury in the adult. It is 
questionable, however, that this same pattern 
of differential impairment exists in the young 
child. From animal studies it is clear that at 
least some functions are less impaired when 
the immature brain is injured than when the 
adult brain is injured. Kennard (1938, 1942) 
found less motor defect in young than in adult 
monkeys and Benjamin and Thompson (1959) 
found less sensory defect in young than in 
adult cats when comparable lesions were made 
in the relevant cortical projection areas of 
both young and adult animals. It also appears 
likely that the kind and degree of defect is 
related not only to the age at which injury 
occurred but also to the age at which perform- 
ance is studied. Kennard found that some mo- 
tor abnormalities were delayed in making 
their appearance. Teuber and Rudel (1962) 
have recently demonstrated that performance 
in a complex perceptual-motor task changes 
systematically with age, between 5 and 18 
years, depending upon when birth injured 
children are tested. 

Personality characteristics have also been 
of interest, particularly in discussions of the 
brain injured child. Some writers have felt 
that there is a personality syndrome, consist- 
ing of such traits as hyperactivity, impul- 
sivity, and distractibility, which is virtually 


pathognomonic of brain injury (Bakwin, 
1949; Bender, 1956; Bradley, 1955; Clements 
& Peters, 1962; Eisenberg, 1957; Laufer, 
Denhoff, & Solomons, 1957; Levy, 1959; Sil- 
ver, 1958; Strauss & Kephart, 1955). The 
evidence for such a relation between per- 
sonality characteristics and brain injury is 
meager, but the hypothesis warrants investiga- 
tion. In the present study traits hypothesized 
as typical of the brain injured child were con- 
trasted with other personality traits which 
might be considered symptoms of maladjust- 
ment. 

These various procedures were administered 
to a normal control sample. The problem of 
selecting controls who are comparable to a 
preschool brain injured group is complicated 
by the fact that many brain injured children 
are mentally retarded. The control sample 
must then consist either of younger children 
with the same mental age as the brain injured 
retardates or of children who are mentally 
retarded but not brain injured. Neither of 
these solutions is entirely satisfactory. In the 
first case control and brain injured samples 
differ in chronological age as well as in brain 
damage. In the second case there is a ques- 
tion whether or not non-brain-injured (en- 
dogenous) retardates and brain injured (ex- 
ogenous) retardates can be distinguished on 
grounds that are not circular, that is, are 
independent of the behavior to be studied. 

The present study avoided these difficulties 
by limiting the brain injured sample to chil- 
dren whose IQ was above 50 and, in the ma- 
jority of cases, above IQ 69. Both the control 
and brain injured samples were selected on 
the basis of meeting criteria for normality or 
for brain injury, respectively. The criteria for 
normality are given below and the second 
section of this report will deal with the criteria 
of brain injury. Both samples consisted of 
children living in their own homes, coming 
from a variety of educational and socioeco- 
nomic backgrounds, and between the ages of 
2 years and 5.5 years. 

Even when control and brain injured sam- 
ples are selected in a comparable manner, they 
must still be equated for variables such as sex 
and socioeconomic status. If these variables, 
which are irrelevant to our main purpose, 
affect performance, their effects must be con- 


BRAIN ĪNJURY IN THE PRESCHOOL CHILD 7 3 


trolled. Matching control and brain injured 
groups is a possible means of achieving con- 
trol but it is a restricting solution which does 
not permit statements to be made about the 
performance of an individual child. It is also 
cumbersome if several groups are to be com- 
pared. A more satisfactory solution is to con- 
trol by weighting statistically for any signifi- 
cant effects of irrelevant variables. 

The present paper describes the test proce- 
dures used, the administration and scoring of 
them, reliability, and the methods of control- 
ling for the variables age, sex, and status 
group. Briefly, the effect of these latter varia- 
bles on performance was first determined in 
a balanced sample of normal children. When 
they were shown to have a significant effect, 
weights were calculated which would permit 
performance to be measured independently of 
the effect. The weights were cross validated 
on a second normal sample. The consideration 
in devising scoring systems and methods of 
weighting was whether or not they resulted 
in distributions which would meet certain sta- 
tistical criteria; no attempt was made to de- 
vise a scoring system which would differentiate 
brain injured from non-brain-injured subjects. 
Whether or not there were differences in the 
performance of brain injured and normal chil- 
dren was then tested in the second portion of 
this study. 


METHOD 
Test Procedures? 


The various procedures were administered individ- 
ually to each child in a single session of about an 
hour’s duration. Testing was carried out by one of 
five psychologists who were trained to administer the 
particular battery of tests. Except for hospitalized 
cases, the mother was generally present during testing, 
but was occupied with sorting cards for a personality 
questionnaire. 


2A limited supply of detailed descriptions of test 
materials and administration and scoring instructions 
can be obtained from the senior author upon request. 
Please specify whether all or particular procedures 
are of interest. 

8 The majority of the examinations were conducted 
by Claire B. Ernhart, Marguerite Craft, and Phyllis 
W. Berman. Eleanor Kenney and R. Lefton also 
served as examiners and the authors wish to express 
appreciation for their assistance, 


Vocabulary Scale 


This scale was composed of verbal subtests from 
the 1937 Stanford-Binet Intelligence Scale. It included 
the Picture Vocabulary and Definition tests from 
Forms L and M and the Word Vocabulary test from 
Form L. Items were administered and scored accord- 
ing to instructions in the Binet manual (Terman & 
Merrill, 1937). The mean of a subject’s mental age 
placement on each of the subtests was his vocabulary 
mental age. Where necessary to establish a basal vo- 
cabulary age, the number of words and combination- 
of-words items from the Cattell (1940) Infant Intel- 
ligence Scale were also included. 


Concepts or Block-Sort Test 


This test required the subjects to match or group 
blocks which differed with respect to three dimen- 
sions: color, size, and form, There were four levels 
of difficulty. Level I involved placing blocks in form 
boards; Level II required matching of blocks; and 
Levels III and IV required sorting. On Level III the 
blocks differed on one dimension and were constant 
in respect to the remaining dimensions. On Level IV 
blocks differed on two dimensions simultaneously 
while the third dimension was held constant. At each 
level all three concepts were tested in separate trials. 
The procedure is similar to that employed by other 
tests of the ability to group according to concepts 
(Goldstein & Sheerer, 1941). 


Perceptual-Motor Tests 


These included a Copy-Forms test, a Motor-Co- 
ordination test, and a Perceptual-Motor battery. The 
Copy-Forms test was not included in the Perceptual- 
Motor battery since it was reliable enough to be used 
alone. The Motor-Coordination test was not included 
because too many subjects were unable to carry out 
the task. 

Copy-Forms Test. The subjects were asked to copy 
18 simple forms presented singly. The forms varied 
in difficulty, included parts of forms as well as wholes, 
and included duplicate forms reversed in orientation. 
Drawings were scored for the accuracy with which 
they reproduced the general gestalt, the organization 
of parts, the orientation on the background, and the 
size relationships and intersection of parts. In an 
earlier paper (Graham, Berman, & Ernhart, 1960) 
the 18 forms were scaled for relative difficulty and 
scoring for the accuracy of reproduction was con- 
trasted with scoring for primitive organization. This 
type of test has had a long history in work with 
brain injured adults and older children (see, €g., 
Bender, 1938; Benton, 1955; Graham & Kendall, 
1960; Lord & Wood, 1942). 

Motor-Coordination Test. The subjects were asked 
to draw lines between two concentric figures without 
crossing the boundaries of either, Three figures were 
used, a circle, a square, and a triangle, with the dis- 
tance between the concentric forms 54, 4, and % 
inches, respectively. The raw score on each figure was 


4 GRAHAM, ERNHART, CRAFT, AND BERMAN 


the total length of lines drawn outside of the bound- 
ing figures. 

Perceptual-Motor Battery. This consisted of four 
subtest procedures: Figure-Ground subtest, Tactual- 
Localization subtest, Mark-the-Cars subtest, and 
Peripheral-Distraction subtest. Two scores were ob- 
tained for the Distraction subtest and for the Mark- 
Cars subtest. 

The Figure-Ground subtest is similar to the test 
developed by Strauss and Lehtinen (1947) for use 
with older brain injured children. It required the 
subjects to identify the 35 objects from the Binet 
Picture Vocabulary, Forms L and M, when these 
were reproduced on distracting backgrounds. The 
stimulus cards were presented singly and there was 
no time limit. The subtest was scored for the number 
of errors in identifying objects which were correctly 
identified during prior administration of the unaltered 
Vocabulary scale. References to the background were 
occasionally given as a response but because such 
responses occurred with low frequency they were not 
dealt with separately in the formal scoring system. 

The Tactual-Localization subtest was an adapta- 
tion of Bender’s “Face-hand” test (Fink & Bender, 
1953). Objects in the environment, a part of the 
child’s body, or two parts simultaneously, were 
touched by the examiner. The child was asked to 
identify the object or body part which had been 
touched. The test was administered with the child’s 
eyes open if he would not respond with eyes closed. 
Difficulty was a function of variations in the part and 
side of the body which was stimulated. 

The Mark-the-Cars subtest required the subjects 
to place a mark on each of 10 drawings of cars dis- 
tributed among drawings of 20 other objects. The 
intent was to measure the ability to maintain a set 
in the face of distracting conditions. Two trials were 
given and two scores were recorded, In addition to 
a score for the accuracy (Mark-Car Accuracy score) 
with which only cars were marked, a second score 
rated the maturity of the markings employed (Mark- 
Car Mark score). 

The Peripheral-Distraction subtest involved placing 
a cardboard vase of flowers in the center of a “table” 
(a sheet of 8 X 11 white paper). The position of 
peripheral distracters (a cardboard glass and plate) 
was varied in the four trials which were given. The 
test was scored for the accuracy with which the 
flowers were placed in the center of the paper (Dis- 
traction-Variable Error) and for the extent of devia- 
tion towards or away from distracters (Distraction- 
Constant Error). 


Personality Measures 


Personality characteristics were measured by ex- 
aminer ratings and by use of a parent questionnaire. 
The intent was not to obtain a comprehensive picture 

‘of the child’s adjustment but rather to measure par- 
ticular personality traits and especially to contrast 
traits thought to be typical of brain injured children 
with other unfavorable traits. 

Parent Questionnaire. This consisted of 209 items 


typed on individual cards. Each item described a kind 
of behavior and the parent was asked to sort the item 
as being like, unlike, or questionably like his child. 
An alternate form was used to obtain test records by 
mail. Items were written on the basis of clinical de- 
scriptions of the brain injured child and from knowl- 
edge of the kind of behavior likely to be exhibited 
by children in this age group. The test was designed 
primarily to cover behavior typical of 3-year-olds and 
some of the items are less applicable to the older pre- 
school child. 

There were 14 subscales, One subscale provided 
checks on general factors determining test response; 
6 composed a Brain-Injury scale, and 7 composed a 
Maladjustment scale. The 6 Brain-Injury subscales 
consisted of items describing hyperactive behavior, 
aggressive behavior, emotional behavior, demanding- 
ness, unpredictable behavior, and behavior which was 
the opposite of these characteristics. This last sub- 
scale was called Temperateness, The 7 Maladjustment 
subscales contained items describing inactive behav- 
ior, infantile behavior, negativism, compulsive be- 
havior, fearfulness, miscellaneous “neurotic” symp- 
toms which were labeled “inward-directed” behavior, 
and 1 subscale with items describing the opposite of 
such behaviors. This was labeled Adjustment or In- 
dependence subscale. 

Examiner Ratings. These were made for eight 
traits. Four were classified as Brain-Injury ratings— 
hyperactivity, demandingness, distractibility, and im- 
pulsivity. Four were classified as Maladjustment 
ratings—infantilism, negativism, fearfulness, and 
compulsivity. 


Subjects 


Subjects were obtained through the cooperation of 
medical facilities and nursery schools in St. Louis, 
Missouri, and Madison, Wisconsin. Children used as 
normal controls were selected to meet specific criteria. 
They were all products of single full-term births 
which had occurred without complications or opera- 
tive intervention. Subjects were not included in the 
normal group if they were known to have behavior 
problems or if there was any history of neurological 
disease, convulsions, head injuries or other evidence 
suggesting injury to the brain, either pre- or post- 
natally. Children with any kind of congenital defect 
were also excluded. In about two thirds of the cases 
medical records were available. For the remaining 
cases the mother was the major source of information. 

Except for children examined in nursery schools all 
subjects were volunteers. When a subject had been 
selected for the study, a letter was sent to the mother 
requesting her cooperation in a study of the devel- 
opment of normal children. Approximately one half 
of the subjects solicited responded to this appeal but 
if a subject did not respond, no further effort was 
made to obtain cooperation. During the latter part 
of the study, the procedure was modified for families 
with telephone listings. The introductory letter re- 
quested cooperation and explained that a psychologist 
would telephone to discuss the study further and to 
make a definite appointment. 


BRAIN INJURY IN THE PRESCHOOL CHILD 5 


Initially, prospective subjects were selected at ran- 
dom from the cases in the files of the St. Louis Ma- 
ternity Hospital who met the criteria and were, at 
the time of testing, within the age range desired. Sub- 
sequently, prospective subjects were also obtained 
through the several Madison agencies. All children 
listed with a given agency who met the criteria of 
normality as far as the agency’s files provided infor- 
mation, and who were of the age, sex, and status 
group needed to complete the design, were asked to 
cooperate with the study. 

Two groups of normal subjects were obtained, a 
balanced sample and a cross-validation sample. The 
balanced sample consisted of 108 children, 18 in each 
of 6 age groups, proceeding by half-year steps from 
2.5 years to 5.5 years. In each half-year group there 
were 9 boys and 9 girls, of whom 3 of each sex were 
Negro and 6 were white. Half of the white children 
were under private medical care and half were under 
clinic care; subjects under clinic care were not ob- 
tained from the most deprived socioeconomic levels. 
Since only a few Negro children were entered as pri- 
vate patients, the private-clinic distinction was not 
made for Negro subjects, The first 108 children of 
appropriate sex, age, and status group who completed 
the full test battery constituted the balanced sample. 

Additional children were examined to obtain a 
cross-validation sample. The age, sex, and status 
group of the combined normal samples are shown in 
Table 1. It should be noted that the total sample 
was not used for most of the analyses to be reported. 
Cross validation of regression equations and computa- 
tion of final regression weights were completed before 
the last 10-15 cases had been obtained. All available 
cases were added in subsequent analyses of the differ- 
ences between normal and brain injured samples, but 
the full test battery was not obtained in every case. 
Reasons for incomplete batteries were varied. Some 
procedures were revised during the course of the 
study and earlier forms had then to be eliminated. 
Some subjects did not cooperate in taking every test 
and a “refusal” was scored in these instances. Occa- 
sionally examiners forgot or incorrectly administered 


a procedure. In addition, 20 subjects were given two 
tests, the Copy-Forms and Block-Sort tests, in a 
changed order. This was part of a separate experi- 
ment on the effects of presentation order, and per- 
formance on these procedures could not be used for 
cross validation. The remaining procedures were 
given in the usual manner, however. Results for 
2-year-olds could not be handled in the same manner 
as those of older subjects and, in the final compari- 
sons, 2.5-year-old subjects were also excluded. The 
exact Ws used in the various analyses are shown in 
the relevant tables, 

‘An additional 19 subjects were examined who could 
not be included in the normal sample because they 
did not meet one or another of the criteria. Usually 
this was due to an error in recording a birthdate or 
to new information obtained from the mother which 
supplemented the medical history. The reasons for 
exclusion are shown in Table 2. 


RESULTS 
Variables Affecting Performance 


To test the effects of age, sex, and status 
(white private, white clinic, and Negro), an 
analysis was made of the variance associated 
with these factors in the 108 cases described 
as the balanced normal sample. It was also 
necessary to test for significant interactions 
among the factors since methods of adjusting 
for effects would be different if interactions 
were present. 

Before carrying out these analyses on the 
nonpersonality procedures, distributions by 
age were plotted for each measure. It was 
expected that age would be a significant varia- 
ble determining performance on these tests 
and the scoring systems were in part devised 


TABLE 1 


DISTRIBUTION OF AGE, SEX, AND STATUS GROUP IN THE COMBINED NORMAL SAMPLES 


Lower limit of 6-month age interval 


Total N 
Er 2 25 3 35 4 45 

48 

M W Pr 9 6 7 8 8 6 4 
MWC 2 4 4 4 11 7 7 39 
MN 1 8 6 5 9 10 7 46 
Total N 12 18 17 17 28 23 18 133 
FW Pr 4 5 8 7 8 7 3 g 
FW CI 0 3 5 4 8 8 4 J 
FN 3 5 7 8 5 6 4 E 

Total N if 13 20 19 21 21 11 
Grand total N 19 31 37 36 49 44 29 245 


Nore—M = male, F = female, 


W= white, N= Negro, Pr= private, Cl= clinic. 


TABLE 2 


SUBJECTS EXCLUDED FROM THE NORMAL SAMPLES 


Reason for exclusion Frequency 


Japanese parentage 

Age over 5-8 

Behavior problem 

History of febrile or other convulsions 
Fetal difficulties 

Premature 

Caesarian section birth 

Congenital defects 

Cerebral palsy 


Total N 19 


wee OR H 


to reflect age changes. It was not possible to 
know in advance whether or not the relation- 
ship with age would be linear, however, and 
whether the tests would show homogeneous 
variance throughout the age range. It would 
be fortunate if this were so but a more likely 
possibility would be skewness in one or both 
directions if a test were too difficult for the 
youngest children or too easy for the oldest. 

On the basis of graphed distributions, revi- 
sions were made in the scoring systems and 
transformations of scores tested for their 
ability to yield linear age regressions, with 
normally distributed scores at each half-year 
age step. Bartlett’s test for homogeneity of 
variance showed that, with the exception of 
Copy-Forms and Mark-Car Accuracy, no sig- 
nificant heterogeneity remained. In the case 
of Copy-Forms, the lack of homogeneity was 
due to the 2.5-year-olds, The test was too 
difficult for this age and the majority of chil- 
dren were unable to obtain any credit on the 
test. Since no transformation of scores could 
improve this situation, the analysis of age, 
sex, and status effects was confined to the 
remaining portion of the age range in which 
the variance did not show significant hetero- 
geneity. The lack of homogeneity in the Mark- 
Car Accuracy scores was due to the opposite 
phenomenon—after the age of 5 the test was 
too easy and did not differentiate among sub- 
jects. The analysis of variance was similarly 
confined to the age range in which the de- 
mands of the statistical model could be 
satisfied. 

Analyses of the variance in performance on 
these tests which were associated with age, sex, 


GRAHAM, ERNHART, CRAFT, AND BERMAN 


and status are shown in Table 3. Age ac- 
counted for the largest amount of the variance 
and was a significant variable on all tests ex- 
cept Distraction-CE. Sex or status-group ef- 
fects were significant with three of the tests. 
None of the interactions was significant. 

Motor-Coordination required special handl- 
ing. A large number of children was unable 
to complete or to receive a score for one or 
more of the three test items. The typical un- 
scorable performance occurred when a child 
scribbled over the form without making any 
effort to follow directions and to stay between 
the boundary lines. Various scoring systems 
were devised to penalize such a performance 
but none of these yielded satisfactory distri- 
butions when combined with the scorable re- 
sults. Data were thus restricted to subjects 
who completed all three items and there were 
too few of these to permit use of the balanced 
three-way analysis of variance test. Instead, 
it was assumed from inspection of means that 
age was a significant variable. Then, having 
adjusted for age, a two-way analysis for sex 
and status effects was made using 54 cases 
with nine subjects in each of the six sex-status 
cells. Neither effect was significant. 

Tables 4 and 5 give the results of these 
analyses of variance for the Parent Question- 


TABLE 3 


F RATIOS FROM ANALYSES OF VARIANCE DUE TO AcE, 
SEX, AND STATUS GROUP 


T A Main factor 
et geranee Age Sex Status 
Vocabulary 2,525.5 33.75** 319 2.70 
Block-Sort 2.5-5.5 19.94** 2.94  4.78* 
Copy-Forms 3.0-5.5 25.65** 7.75** 7.83** 
Motor- 
Coordination 3.0-5.5 —® 46 1.74 
Perceptual-Motor 
Figure-Ground 2.5-5.5 2.36* 24 88 
Localization 2.5-5.5 1845** 1.04 2.66 
Mark-Car 
Accuracy 2.5-5.0 4.06* 3.47 2.05 
Mark-Car Mark 2.5-5.5 14.21** 16.82** 2.55 
Distraction-VE 2.5-5.5 5.24** 14 2.65 


Distraction-CE 2.5-5.5 2.07 39 61 


, Note.—For age range 2.5-5.5, N = 108; for Motor-Coordina- 
tion, N = 54; for other restricted age ranges, N= 90, There 
were no significant interaction effects, 
à Deemed significant by inspection of means (see text). 
*p<.05. 
**p<0l. 


Pe 


BRAIN INJURY IN THE PRESCHOOL CHILD 7 


naire subscales and for the Examiner Ratings. 
On the Parent Questionnaire, status group sig- 
nificantly affected the score on four of the 
subscales. In each of these subscales, the 
scores of the Negro group were higher. This 
same pattern was seen on other subscales as 
well, although the difference did not attain 
significance. The questionnaire probably was 
not interpreted in the same way by parents of 
Negro children as by parents of white chil- 
dren, although it is unlikely that response 
compliance accounts for the difference. Negro 
parents tended to rate their children more un- 
favorably on all subscales, whether unfavora- 
ble judgment required agreement or disagree- 
ment with the test item. 

Neither sex nor age showed a consistent 
relation to subscale scores. As would be 
expected, Independence scores increased with 
increasing age. There was also a significant 
interaction of sex and age on the Infantilism 
subscale. The youngest boys were rated as 
more infantile than older boys and more in- 
fantile than girls of any age. The oldest girls, 
however, were more infantile than boys of the 
same age. Both male and female 2.5-year-olds 


were less predictable than older children but 
there was no consistent age or sex relation 
beyond age 2.5. 

On the Examiner Personality Ratings (see 
Table 5), the only significant main effect was 
that of sex. Males were rated significantly 
higher on activity, demandingness, distracti- 
bility, and impulsivity. For demandingness, 
distractibility, and infantilism, the sex effect 
varied with age, that is, younger males tended 
to be rated higher. For infantilism and nega- 
tivism, sex and status interacted. Negro fe- 
males tended to be less infantile and white 
private males less negativistic than other 
groups. The Impulsivity rating was adopted 
too late in the study to be used in the balanced 
sample analysis. However, inspection of the 
ratings suggested that a sex difference was 
present in these ratings also and comparison of 
two groups of 36 subjects each, matched for 
age and status, confirmed the significant sex 
effect, The Compulsivity rating was dropped 
after studies of interscorer reliability revealed 
that there was low agreement on the rating. It 
should be noted that most of the ratings were 


TABLE 4 


F Ratios FROM ANALYSES OF THE VARIANCE IN PARENT QUESTIONNAIRE SUBSCALES Due To AGE, 
SEx, AND Status GROUP 


F ratio 
Subscale Main factor Interaction 
Age Sex Status Sexx Age Sexx Status Agex Status Triple 

Brain-Injury 

Hyperactivity .26 54 6.23** 1.30 18 1.47 .70 

Aggressiveness 1.00 62 .29 62 92 66 63 

Emotionality .28 2.20 18 88 1.58 62 12 

Demandingness -13 01 1.10 1.53 all 83 80 

Unpredictability 1.96 84 2.36 3.76** 75 1.57 EB 

Temperateness 53 31 90 1.77 84 53 r 
Maladjustment 

Inactivity 1.97 1.34 1.18 13 04 ‘le ae 

Infantilism 1.42 27 4.04* 2,36* 72 1.6! ne 

Negativism 21 01 1.50 1.49 64 es Re 

Compulsiveness 1.11 1.14 4.78* 48 41 H s 

Fearfulness 1.17 .22 1.29 1.10 43 en eG 

Inwardness 1.36 37 4.58* 1.03 82 7a a 

Independence 5.07** 69 1.89 217 29 $ s 
Buffer scale 1.56 08 26 54 12 26 -37 


Nore.—N = 108. 
*p<.05. 
** > <.01. 


8 GRAHAM, ERNHART, CRAFT, AND BERMAN 


TABLE 5 


F RATIOS FROM ANALYSES OF THE VARIANCE IN EXAMINER Ratincs DUE TO AGE, SEX, AND 
STATUS Group 


F ratio 
Rating Main factor Interaction 
Age Sex Status Sex x Age Sex Status Age x Status Triple 
Brain-Injury 
Activity 24 11.67** 94 1.86 49 80 98 
Demandingness 1.20 8.33** 90 2.50* 40 1.34 59 
Distractibility 32 5.78* 08 3.16* 17 1.14 90 
Impulsivity * — 6.25* — 
Maladjustment 
Infantilism AS 78 1.76 3.21* 3.55* 93 96 
Negativism 1.46 3.40 39 40 3.21* 26 75 
Fearfulness 91 01 34 61 1.32 1.45 95 
Compulsiveness Not done (interscorer reliability too low) 


a Tested by matched groups (N = 72). 
*p<.05, 
SLON 


made by female examiners; the effects might 
be different had male examiners been used. 


Adjustments for Variables Affecting Per- 
formance 


The previous analyses determined which 
variables significantly affected performance on 
the various procedures. By adjusting statis- 
tically for such effects, it is possible to char- 
acterize the test performance of a subject by 
a score which is independent of the effects. 
This was done in one of two ways. For the 
nonpersonality tests, weights were determined 
from regression equations. For the person- 
ality measures, weights were determined which 
would equalize marginal totals. 

Nonpersonality Measures, Since interaction 
among age, sex, and status did not affect per- 
formance on the nonpersonality tests, data 

- from the total normal sample were used to 
estimate regression weights. Weights were 
estimated for a variable when its effect was 
significant at the .05 level or, if age accounted 
for the major portion of the variance, when 
the effect of a second variable was significant 
at the .10 level. Since the presence of a varia- 
ble with a large effect makes it less likely that 
a significant effect will be demonstrated for 
a minor variable, the required significance 
level was lowered under these circumstances. 


There would be no gain in weighting for 
chance variation, but it was important that 
a variable should not be deemed insignificant 
on the basis of an insensitive test. 

To use linear regression methods in estima- 
tion, two conditions should be met—the sub- 
group variances should be homogeneous and 
the regression should be linear. Failure to 
detect significant heterogeneity on Bartlett’s 
test was considered satisfactory evidence that 
the first requirement had been met. Linearity 
of the age regression was determined from in- 
spection of the graphed distributions. In gen- 
eral, while the 2.5-year age group did not 
differ significantly in variance from the re- 
maining age groups, the improvement in per- 
formance from 2.5 to 3.0 years was larger than 
the improvement over any later 6-month pe- 
riod. The age-performance relationship was 
consequently not linear over the full 2.5-5.5 
year age range, but was linear from 3.0 to 5.5. 
Since further score transformations which 
would remedy this situation could not be de- 
vised, the 2.5-year results were not employed 
in determining regression weights, except with 
the Distraction and one of the Mark-Car sub- 
tests. To achieve linearity, it was also neces- 
sary to eliminate the 5-year group in esti- 
mating regression weights for the Localization 
and for the Mark-Car Accuracy subtests, 

Regression equations were first calculated 


a 


Brain Injury IN THE PRESCHOOL CHILD 9 


using the balanced sample. Sex and status 
were treated as categorical variables. Values 
of O and 1 were assigned to sex and the three 
status means were examined. If they were 
roughly equidistant, the values 0, 1, and 2 
were assigned. If the two white groups were 
close or showed a reversal, with scores of 
clinic subjects being superior to those of pri- 
vate subjects, the values assigned were O for 
the Negro group and 1 for both private and 
clinic white groups. The reliability of the re- 
gression weights calculated from the balanced 
sample was then estimated by applying them 
to the cross-validation sample. All regression 
weights were successfully cross validated, 
there being no significant differences in the 
mean residual scores of the two samples. A 
final regression equation was computed based 
on the combined samples. ; 

Table 6 shows for each test the regression 
weights, in standard score form, and R’, 
which gives the proportion of the variance in 
test score accounted for by the combination 
of weighted variables. From the regression 
weights, a child’s performance can be pre- 
dicted on the basis of age, sex, and status 
group. The discrepancy between the predicted 
performance and the obtained performance is 
then a residual score adjusted for these ir- 
relevant variables, The residual variance 
represents the extent to which individual 
differences and unknown factors determine 
performance. 

Parent Questionnaire. All subscales of the 
Parent Questionnaire were adjusted for status 


because of the consistent finding that Negro 
parents rated their children more unfavorably 
than did white parents. No additional adjust- 
ments were made for the significant interac- 
tions of sex and age with the Infantilism and 
Unpredictability subscales. These interactions 
were complexly determined and weighting 
would have been based on relatively small 
subgroups. Since weights based only on status 
were successfully cross validated, failure to 
make the additional adjustments apparently 
did not increase variability of the residual 
scores to a significant extent. To simplify the 
problem of adjustments, the Independence 
subscale, which was significantly associated 
with age, was excluded from further analyses. 

Adjustments were obtained by equalizing 
the means of the three status groups (white 
private, white clinic, and Negro) in the bal- 
anced sample. As with the nonpersonality 
tests, the weights were cross validated in the 
remaining normal sample and the two sam- 
ples were then combined to obtain the final 
adjustments. 

Examiner Ratings. Except for the Fearful- 
ness rating, which required no adjustment, all 
Examiner Ratings were higher for boys than 
for girls. In the case of the Activity and Im- 
pulsivity ratings, weights which would equal- 
ize sex means were calculated from the bal- 
anced sample and cross validated in the usual 
manner. Adjusting the remaining four ratings 
was a more complex process because of sig- 
nificant interactions of sex with age or status. 

The procedure was to inspect the 36 cell 


TABLE 6 


STANDARD PARTIAL REGRESSION COEFFICIENTS 


Regression coefficient 


ed Age range N Age Sex Status R 
bular: 3.0-5.5 178 69 .09 20 48 
Be Son. 3.0-5.5 148 AL 16 30 ae 
Copy-Forms 3.0-5.5 119 72 14 22 5 
Motor-Coordination 3.0-5.5 98 —.59 — — ; 
Perceptual-Motor F 
Figure-Ground 3.0-5.5 172 227 — — u 
Localization 3.0-5.0 141 51 — 12 a 
Mark-Car Accuracy 2.5-5.0 178 58 — — ae 
Mark-Car Mark 3.0-5.5 178 a .23 28 a 
i ion- 2.5-5.5 199 -< = — 5 
Bee te 2.5-5.5 199 —.28 = _ 06 


Distraction-CE 


10 GRAHAM, ERNHART, CRAFT, AND BERMAN 


means of the analyses of variance to deter- 
mine which cells or combinations of cells ac- 
counted for the interactions, For example, 
the sex by age interaction affecting Distracti- 
bility arose because boys were, on the whole, 
judged to be more distractible than girls, but 
this was especially true of the younger boys. 
Three subgroups were therefore formed: girls, 
boys aged 2.5-3.99, and boys aged 4.0-5.5. 
Weights which would equalize the means of 
these three groups were then tested on the 
cross-validation sample. If residual scores did 
not differ in the cross-validation sample, the 
weighting system was retained with final 
weights being computed from the combined 
normal samples. If the weights did mot cross 
validate, a new grouping was formed and the 
process repeated. In the case of Distractibility 
the original weighting was not successful and 
inspection of means revealed that 2.5-year- 
olds in the cross-validation group differed 
from those in the balanced sample. This 
youngest group was, consequently, excluded 
from both samples and weights were computed 
only for the age range 3.0-5.5. Because an 
empirical method was used, in part, to deter- 
mine the weighting systems, there can be less 
confidence than with other measures that 
cross validation measures the reliability of the 
adjustments. 


Standardization of Scores 


To facilitate comparisons among tests, the 
adjusted or residual scores were converted to 
a common scale, In the present work the 
convenient linear transformation to a distribu- 
tion having a mean of 50 and a standard devi- 
ation of 10 was adopted. In reporting group 
comparisons in Graham, Ernhart, Thurs- 
ton, and Craft (1962), scores were converted 
to the equivalent scale in which the mean was 
zero and the standard deviation unity. In 
both scales scores below the mean represented 
poorer performance. For the personality meas- 
ures scores below the mean were in the direc- 
tion hypothesized to be characteristic of the 
brain injured or, where there was no hypo- 
thesis, in the less desirable direction. Com- 
posite scores for the Perceptual-Motor battery 
and the personality scales were the unweighted 
means of the component subtests. 


The scores were not normalized in making 
the transformation. Gulliksen (1950, p. 280- 
281) has discussed objections to the use of 
normalized scores and these are particularly 
pertinent where relatively small samples are 
concerned. Departures from normality were 
tested by chi square with 8 degrees of freedom, 
comparing frequencies at intervals of .5 stand- 
ard deviations and combining frequencies at 
extremes of the distributions so that expected 
N equaled at least five for each interval. The 
degree of skewness in the departure from nor- 
mality was estimated roughly by the equation 
3(M — Md)/s. As Table 7 shows, significant 
negative skewness was present with the Block- 
Sort test and with the personality scales. 

Most of the subtests of both the personality 
scales and the Perceptual-Motor battery also 
showed negative skewness, although not al- 
ways to a significant degree. A further test 
was carried out to determine whether or not 
subtests which were combined to give a com- 
posite score could be said to be drawn from 
a common distribution. The sum of frequen- 
cies across subtests making up a composite 
scale was determined at intervals of .5 stand- 
ard deviations. This constituted the common 
distribution and was the basis for calculating 
expected frequencies for each of the com- 
ponent subtests. Departures from the expected 
frequency distributions were then tested by 
chi square. The Examiner Maladjustment 
ratings did differ from the distribution com- 


TABLE 7 


SKEWNESS AND CHI Square TEST FOR GOODNESS OF 
Frr TO A NORMAL DISTRIBUTION 


Test Skewness Chi square 
Vocabulary E) 46 
Block-Sort —6 34.0** 
Copy-Forms 0 3.0 
Motor-Coordination 3 5.2 
Perceptual-Motor —3 3.1 
Parent Questionnaire 

Brain-Injury —3 19.9* 
Maladjustment -3 28.5** 
Examiner Ratings 
Brain-Injury -9 16.5* 
Maladjustment —.9 26.1** 
*p< 05, df=8. 


** p< 01, df=8. 


~ 


BRAIN Injury IN THE PRESCHOOL CHILD 11 


mon to the three ratings. The subtests making 
up the other scales were more homogeneous in 
distribution, however. With the exception of 
the Figure-Ground of the Perceptual-Motor 
battery, the Impulsivity rating of the Ex- 
aminer Brain-Injury scale, and the Nega- 
tivism subscale of the Parent Maladjustment 
scale, the distributions of subtests could have 
arisen from random sampling of their common 
distributions. 

As noted above, the general trend was to- 
wards negative skewness, In the case of the 
personality measures this was a function of 
the way in which the traits were conceptu- 
alized since it was difficult to describe be- 
havior superior to normal which would be as 
extreme as behavior inferior to normal. Skew- 
ness of the Block-Sort and of the Perceptual- 
Motor subtests was associated with ceilings on 
the tests which also led to gradually decreas- 
ing variance with increasing age. Although 
these variance changes were not sufficiently 
large to be judged heterogeneous on Bartlett’s 
test, they were systematic and would result 
in systematic error if the standard deviation 
(or standard error of estimate) for the full 
age range were employed in transforming to 
standard scores. Consequently, we used the 
standard deviations of narrower age groups 
when this phenomenon appeared.‘ 


Reliability 


The primary purpose of the present study 
was to develop measures which could be used 
in investigating the functioning of brain in- 
jured, preschool children, From this point of 
view a demonstration of significant differences 
would require that the differences between 
groups be greater than the unreliability or 
error in measuring. However, if group differ- 
ences were not significant, it might be either 
because there was no difference in perform- 
ance of the groups or because the error of 
measurement was large. It is, therefore, per- 
tinent, even for group studies, to obtain in- 
formation on reliability and especially on the 
relative reliability of different measures which 


may be compared. 


4 Regression and transformation equations can be 
obtained from the senior author. 


Several types of reliability have been con- 
sidered: measures of interexaminer agreement 
where skill is required either to make observa- 
tions or to score a subject’s performance, sin- 
gle-session reliability estimated by split-half 
correlations, and test-retest reliability after 
6 months. All measures of reliability were ob- 
tained on normal subjects. This may under- 
estimate reliability in discriminating abnormal 
from normal groups if the performance of ab- 
normal subjects extends much beyond the 
range of normal subjects. On the other hand 
test-retest correlations in brain injured chil- 
dren may vary as a function of differences in 
the time elapsed since injury and thus may 
be difficult to interpret as a measure of 
reliability. 

Interexaminer Reliability, The eight ratings 
of personality characteristics were the only 
measures depending primarily upon examiner 
judgements. Correlations of two examiners 
simultaneously observing the same child but 
making independent ratings are shown in 
Table 8 for a sample of 25 subjects. The sub- 
jects ranged in age from 2 to 4.5 years. Except 
for Compulsiveness, the correlations are rea- 
sonably satisfactory for this kind of measure. 
Apparently the concept of compulsiveness was 
interpreted differently by the two examiners 
and, as used, was not a characteristic which 
could be measured meaningfully, It was not 
included, therefore, in further analyses. 

A certain amount of skill was also required 
in scoring the Copy-Forms and Mark-Car 
tests, A sample of 50 Copy-Forms tests, dis- 


TABLE 8 


INTEREXAMINER RELIABILITY FOR EXAMINER RATINGS 
or 25 NORMAL CHILDREN 


Rating r 
Brain-Injury 
Activity 66 
Demandingness 81 
Distractibility 65 
Impulsivity 83 
Maladjustment 
Infantilism 50 
Negativism 60 
Compulsiveness 14 
Fearfulness 53 


12 GRAHAM, ERNHART, CRAFT, AND BERMAN 


tributed among all age groups, was rated by 
two judges on each of the eight possible ac- 
curacy scores. Correlations ranged from .92 
for intersections to .99 for orientation and for 
organization. Correlations between two judges 
in scoring 50 Mark-Car records obtained from 
3-year-old children in another study (Graham 
et al., 1962), were .96 for the kind of mark- 
ings employed and .90 for the accuracy of 
marking. Scoring methods were apparently 
sufficiently objective so that they did not con- 
stitute a major source of unreliability. 

Single-Session Reliability. Single-session re- 
liability is difficult to obtain for Block-Sort, 
Motor-Coordination, and the Examiner Rat- 
ings since these procedures have too little 
replication of items of comparable difficulty or 
of items similar in kind. Vocabulary, Copy- 
Forms, and the Parent Questionnaire sub- 
scales could, however, be divided into two 
halves of approximately equal difficulty, Part- 
whole correlations were computed for the sub- 
tests of the Perceptual-Motor battery. 

One half of the Vocabulary test consisted of 
even items of the Stanford-Binet Picture Vo- 
cabulary, Form L; odd items of Form M; 
Definition Item 2 of Form L; Definition Items 
1 and 3 of Form M; and even items of the 
Word Vocabulary. The second half consisted 
of the remaining items, omitting the last pic- 
ture of Picture Vocabulary, Form M. A sam- 
ple of 50 subjects, 10 each from the five age 
groups between 3.0 and 5.5, gave a corrected 
correlation of .93 between the two halves. 

Copy-Forms was divided into two forms 
such that the mean scale value of Form A was 
10.7 (Designs 1, 3, 6, 7, 10, 13, 15, 16, 17) 
and Form B, 10.8 (the remaining designs). 
Scale values were reported previously in Gra- 
ham, Berman, and Ernhart (1960). The cor- 
rected correlation of the two forms was .98 
for a second sample of 50 subjects distributed 
equally among the five age groups. Since the 
uncorrected correlation was .97, it is probable 
that the test could be shortened to one half 
of its present length without impairing relia- 
bility. However, the two forms were given 
on the same occasion, with items interspersed, 
so there is no way of assessing their effects 
on one another. 

Items of the Parent Questionnaire subscales 
were, numbered for recording purposes in or- 


der of judged similarity. Two parallel forms 
could thus be obtained conveniently by using 
odd-even items from the record sheet. Since 
presentation order was randomly determined 
and varied from subject to subject, this is not 
odd-even reliability in the usual sense. Table 
9 shows this split-half reliability for a sample 
of 100 subjects distributed among all age 
groups. Reliability by 6-month age groups is 
also shown for six of the subscales. The test 
was initially devised to be used with 3-year- 
olds and many of the items were worded in 
such a way that they appear less appropriate 
to the older child. Despite this, there was no 
significant main effect of age in the analysis 
of variance (see Table 4), except for the In- 
dependence subscale, and there was no sys- 
tematic change with age in the reliability. 

Correlations between the perceptual-motor 
subtests and the battery of which they were a 
part are shown in Table 10 for the 90 subjects 
of the balanced sample who were between 3.0 
and 5.5 years. The correlations were based 
on standard scores so that age, sex, and status 
heterogeneity were removed. 

Test-Retest Reliability. Forty-six subjects 
relatively evenly distributed among sex, age, 
and status groups were re-examined after a 
6-month interval. There was no significant 
practice effect on any of the nonpersonality 


TABLE 9 


Sprit-Hatr CORRELATIONS, CORRECTED FOR LENGTH, 
OF PARENT QUESTIONNAIRE SUBSCALES 


6-month age group 


Subscale LEN Se Sra 
2.5-5.5 2.5 3.0 3.5 4.0 4.5 5.0 


Brain-Injury 


Hyperactivity 89 81 89 93 93 86 .94 
Aggressiveness 78 82 83 92 81 .72 .89 
Emotionality «77 BL .86 83 .72 .82 .67 


Demandingness -19 
Unpredictability 78 


Temperateness 81 
Maladjustment 
Inactivity “1.76 72 86 51 .81 .67 
Infantilism 65  .62 64 80 58 65 .77 
Negativism «77 76 63 .79 87 58 .78 
Conpulsiveness 69 
Fearfulness 54 
Tnwardness -70 
Total N 100 31 36 33 41 39 26 


— 


BRAIN INJURY IN THE PRESCHOOL CHILD 13 


TABLE 10 


CORRELATIONS BETWEEN SUBTESTS AND TOTAL SCORE 
ON THE PERCEPTUAL-MOTOR BATTERY 


Subtest r 
Figure-Ground 48 
Localization 36 
Mark-Car Accuracy 63 
Mark-Car Mark A 58 
Distraction-VE 62 
Distraction-CE 44 

Note.—N = 90. 


tests. The mean change in standard score 
from test to retest was .36; four of the tests 
showed insignificant drops in mean score and 
six showed insignificant rises. 

Test-retest correlations of raw scores were 
calculated for all subjects and test-retest cor- 
relations of standard scores were calculated 
for subjects between 3 and 5.5 years. Table 
11 gives these for the nonpersonality measures. 

The raw-score reliability coefficients for 
Vocabulary,  Block-Sort, Copy-Forms, and 
Motor-Coordination are reasonably satisfac- 
tory. Consistency is generally less in a pre- 
school group than with older subjects and 
these are relatively brief procedures repeated 
after an unusually long interval. Standard 
score reliability coefficients are less than the 
raw score coefficients but this is to be expected. 
Not only is the N smaller and the age range 
reduced in determining standard score relia- 


TABLE 11 


Trst-Retest RELIABILITY OF NONPERSONALITY 


MEASURES 
Raw score Standard score 
Test —— 

N r N r 
Vocabulary 45 86 36 62 
Block-Sort 37 -70 33 61 
Copy-Forms 41 86 33 61 
Motor-Coordination 33 78 29 49 
Perceptual-Motor — — 34 58 
Figure-Ground 42 05 35 12 
Localization 45 -60 36 Al 
Mark-Car Accuracy 46 62 37 45 
Mark-Car Mark 46 61 37 38 
Distraction-VE 46 64 37 46 
Distraction-CE 46 28 37 32 


bility but, more importantly, a significant 
portion of what normally remains constant 
from test to retest is removed by the adjust- 
ments for age, sex, and status. Standard 
scores are not, therefore, less reliable than 
raw scores: the measure of reliability is lower 
primarily because it does not include spuri- 
ous correlation. 

No measure of raw-score reliability could be 
computed for the Perceptual-Motor battery, 
but the standard score test-retest correlation 
is as high as that of the other tests. The sev- 
eral nonpersonality measures thus appear to 
be comparable in reliability of measurement. 
The subtests of the Perceptual-Motor battery, 
however, are considerably less reliable than 
the main measures. None of the correlations 
is high enough to warrant use of a subtest by 
itself and the reliability of Figure-Ground 
errors is so low as to raise the question 
whether it should not be eliminated from the 
battery. It was retained because the low re- 
liability among normal subjects could be ac- 
counted for by the narrow range of scores 
and by the large number of subjects who 
made no errors. In contrast, a substantial 
proportion of the scores of brain injured sub- 
jects exceeded the normal range. It appeared 
likely that the reliability of classifying a sub- 
ject as normal or brain injured might be 
higher than in discriminating among normal 
subjects. 

Test-retest correlations of the personality 
measures are listed in Tables 12 and 13 for 
both raw and standard adjusted scores. The 
reduction in correlation with the use of stand- 
ard scores is less than in the case of the non- 
personality measures since only a small portion 
of the variance in the personality measures 
was accounted for by age, sex, and status. 

The reliability coefficients of the Parent 
Questionnaire subscales compare favorably 
with those for the nonpersonality measures. 
The Examiner Ratings are less satisfactory, 
however, and suggest that these ratings must 
be interpreted with caution. While an effort 
was made to provide objective descriptions 
and anchoring of the rating scales, the task 
was both more complex and more subjective 
than that required of the parents. This, how- 
ever, is probably not the main reason for the 
lower test-retest reliability of the ratings since 


14 GRAHAM, ERNHART, CRAFT, AND BERMAN 


TABLE 12 


TEST-RETEST RELIABILITY OF PARENT QUESTIONNAIRE 


TABLE 14 


CORRELATIONS BETWEEN FATHER AND MOTHER 


SCALES ANSWERS TO THE PARENT QUESTIONNAIRE 
Scale Raw score Standard score Subscale Raw score Standard score 
Brain-Injury — 80 Brain-Injury j — 58 
Hyperactivity eth! 73 Hyperactivity 68 62 
Aggressiveness 76 76 Aggressiveness 55 53 
Emotionality 69 69 Emotionality 40 39 
Demandingness 74 Ab) Demandingness AT AS 
Unpredictability 84 78 Unpredictability 28 25: 
Temperateness Aus 76 Temperateness „51 Al 
Maladjustment — 84 Maladjustment — 55 
Inactivity 66 66 Inactivity 07 .09 
Infantilism 72 7 Infantilism 39 36 
Negativism 84 83 Negativism .22 30 
Compulsiveness 74 .70 Compulsiveness 72 71 
Fearfulness 68 67 Fearfulness 59 56 
Inwardness 68 68 Inwardness 31 50 
Nore.—N = 46, Note.—N = 42, 


two observers making simultaneous ratings 
of the same child could arrive at reasonably 
similar decisions (see Table 8). In fact, the 
agreement between raters was higher on the 
whole than the agreement between mothers 
and fathers when both completed the Par- 
ent Questionnaire (see Table 14). Probably 
the crucial factor is the size and representa- 
tiveness of the sample of behavior observed. 
If only a small segment of behavior is ob- 
served on each of two occasions, it is more 
likely that the observations will differ than 
would be true if the observations embraced 
a large segment of behavior. In short, a child 
may behave quite differently on two visits 


TABLE 13 


Test-RerEst RELIABILITY OF EXAMINER RATINGS 


Rating Raw score Standard score 
Brain-Injury — 35 
Activity Ayi al 
Demandingness 45 43 
Distractibility iyi 40 
Impulsivity .20 1S 
Maladjustment — -10 
Infantilism 63 61 
Negativism 39 40 
Fearfulness 53 53 


Notr.—N = 46 except for standard score correlations of the 
Distractibility and Negativism ratings where elimination of 
2.5-year-olds reduced the Ns to 37 and 42, respectively. 


to a psychologist, especially when these are 
6 months apart. The ratings of two psy- 
chologists on each visit might show high 
agreement but the ratings of either with the 
ratings 6 months later show little agreement. 
There were, in our experience, at least a few 
children who clearly demonstrated this. They 
were remembered because of some conspicu- 
ous behavior on the initial test, but behaved, 
on retest, in a strikingly different manner. 

A further reason for caution in ascribing 
generality to examiner ratings is a lack of 
correlation between the Examiner and the 
Parent subscales supposedly measuring the 
same characteristic. None of the Examiner- 
Parent correlations differed significantly from 
zero. 

These interrelations among observers of a 
child may be compared with those reported 
by Becker (1960). He found that the aver- 
age correlation of factor scores derived from 
ratings to be .76 between teacher ratings, .52 
between parents, .31 between mothers and 
teachers, and .28 between fathers and teachers. 
He ascribes the low correlation between 
teachers and parents to differences in the 
sample of behavior available to the two types 


- of raters. 


Intercorrelations 


Intercorrelations of the nonpersonality tests 
(standard scores) were calculated for the 90 


BRAIN INJURY IN THE PRESCHOOL CHILD 15 


TABLE 15 


INTERCORRELATIONS AMONG NONPERSONALITY TESTS 


Test Vocabu- Block- Copy- 
lary Sort Forms 
Vocabulary 1.00 ay) .25* 
Block-Sort «17 1.00 41** 
Copy-Forms .25* 41** 1.00 
Perceptual-Motor 324 37** 4g** 
Note.—N = 90, 
*p<.05. 
** p< Ol. 


subjects of the balanced sample who were 
between 3 and 5.5 years. Motor-Coordina- 
tion was not included since this test could 
not be scored for all subjects. The correla- 
tion matrix is shown in Table 15. Most of 
the correlations are significant but they are 
generally low. While low intercorrelations 
may be due, in part, to unreliability of the 
measures, a substantial portion of the vari- 
ance still remains unaccounted for either by 
unreliability or by factors common to the 
tests, The various tests are to a considerable 
extent independent of one another. A simi- 
lar conclusion was drawn from intercorrela- 
tions of these and of the personality measures 
for the 3-year-old sample reported in Graham, 
Ernhart, Thurston, and Craft (1962).° 


SUMMARY 


Procedures were developed to measure vo- 
cabulary skill, conceptual ability, perceptual- 
motor ability, and personality characteristics 
of preschool age children. Particular test pro- 
cedures were selected either because they had 
successfully differentiated brain injured from 
normal adults or because they measured func- 
tions relevant to theoretical questions con- 
cerning the brain injured child. These tests 
were administered to a sample of 108 normal 
children, balanced for sex, for status-group 
(white private, white clinic, and Negro), and 
for age by 6-month intervals from 2.5 to 5.5 
years. An additional 137 children were ex- 


5 Table 26 of Graham, Ernhart, Thurston, and 
Craft (1962) contains a printer’s error. Lines 7 and 
8 in the left-hand column should read, respectively, 
«Binet and Composite Perceptual-Motor” and “Binet 
and Perceptual-Motor test.” 


amined to provide a cross-validation sample. 

1. Nonpersonality tests were scored so as 
to yield homogeneous variance and linear 
changes with age. For some tests, it was 
necessary to eliminate extreme age groups for 
whom the tests were too hard or too easy. 

2. The effects of age, sex, status group and 
their interactions were determined for each 
procedure. Age was the most important vari- 
able affecting performance on vocabulary, 
conceptual, and perceptual-motor tests. Sex 
and status were less important but, in gen- 
eral, girls were superior to boys, and the per- 
formance of status groups was in the ex- 
pected socioeconomic order. The measures of 
child personality obtained from parents were 
also influenced by status. Negro parents 
tended to judge their children less favorably 
on all scales. Examiners’ ratings of per- 
sonality, on the other hand, were more in- 
fluenced by sex and they tended to rate males 
less favorably than females. 

3. When sex, status, or age significantly 
affected performance, weights were calculated 
which would permit performance to be meas- 
ured independently of the effect. The relia- 
bility of the weights was estimated by com- 
paring the balanced sample with the 
cross-validation sample. Final weights were 
based on the combined samples. 

4, All scores were expressed in standard 
form so that one function could be com- 
pared to another, within the limitations of 
such a methodology. Since differences in the 
shape of distributions are a major limitation 
in making such comparisons, distributions 
were tested for departure from normality. 

5. Several types of reliability were con- 
sidered, Interexaminer agreement in scoring 
was shown to be high and, in making per- 
sonality ratings, to be satisfactory. Where 
applicable, single-session reliability was esti- 
mated by split-half correlations. These also 
were satisfactory. Test-retest reliability after 
6 months was lower than single-session reli- 
ability but was satisfactory for the nonper- 
sonality tests and for the parent measures of 
child personality. The test-retest reliability 
of most of the examiner ratings of personality 
was so low, however, as to raise questions 
concerning the meaningfulness of such a 


16 GRAHAM, ERNHART, CRAFT, AND BERMAN 


measure. There was no significant correla- 
tion between parent and examiner measures 
of similar personality characteristics but there 
were also low correlations between father and 
mother measures of the same characteristics. 

6. Intercorrelations among vocabulary, 


conceptual, and perceptual-motor measures 
suggested that the various tests were to a 
considerable extent independent of one an- 
other. Previous work indicated no significant 
correlation between personality measures and 
the other procedures. 


REFERENCES 


Baxwın, H. Cerebral damage and behavior disorders 
in children. J. Pediat., 1949, 34, 371-381. 

Becker, W. C. The relationship of factors in pa- 
rental ratings of self and each other to the behavior 
of kindergarten children as rated by mothers, 
fathers, and teachers. J. consult. Psychol., 1960, 24, 
507-527. 

BENDER, L. A Visual Motor Gestalt test and its 
clinical use. Res. Monogr. Amer. Orthopsychiat. 
Ass., 1938, No. 3. 

BENDER, L. Psychopathology of children with or- 
ganic brain disorders. Springfield, Ill.: Charles C 
Thomas, 1956. 

Benyamin, R. M., & Toompson, R. F. Differential 
effects of cortical lesions in infant and adult cats 
on roughness discrimination. Exp. Neurol., 1959, 
1, 305-321. 

Benton, A. L. The revised visual retention test: 
Clinical and experimental applications. New York: 
Psychological Corporation, 1955. 

Bravtey, C. Organic factors in psychopathology of 
children. In P, H. Hoch & J. Zubin (Eds.), Psy- 
chopathology of childhood. New York: Grune & 
Stratton, 1955. Pp 85-97. 

CATTELL, P. The measurement of intelligence of in- 
fants and young children. New York: Psychological 
Corporation, 1940. 

CLEMENTS, S. D., & Prrers, J. E. Minimal brain 
dysfunctions in the school-age child. Arch. gen. Psy- 
chiat., 1962, 6, 185-197. 

E1senserc, L. Psychiatric implications of brain dam- 
age in children. Psychiat. Quart., 1957, 31, 72-92, 

Fx, M., & Benper, M. B. Perception of simultane- 
ous tactile stimuli in normal children. Neurol., 1953, 
3, 27-34. 

GOLDSTEIN, K., & SCHEERER, M. Abstract and concrete 
behavior: An experimental study with special tests. 
Psychol. Monogr., 1941, 53(2, Whole No. 239). 

GRAHAM, F. K., BERMAN, P. W., & ErNmART, C. B. 
Development in preschool children of the ability to 
copy forms, Child Develpm., 1960, 31, 339-359. 

Grawam, F. K., ERNHART, C. B., THURSTON, C., & 


Crarr, M. Development three years after perinatal 
anoxia and other potentially damaging newborn 
experiences. Psychol. Monogr., 1962, 76(3, Whole 
No. 522). 

Granam, F. K., & Kenpatt, B. S. Memory-for-de- 
signs test: Revised general manual. Percept. mot 
Skills, 1960, 11(Monogr. Suppl. 2-V11), 147-188. 

GULLIKSEN, H. Theory of mental tests. New York: 
Wiley, 1950. 

Hess, D. O. The organization of behavior: A meuro- 
psychological theory. New York: Wiley, 1949. 
Kennard, M. A. Reorganization of motor function 
in the cerebral cortex of monkeys deprived of mo- 
tor and premotor areas in infancy. J. Neuro- 

physiol., 1938, 1, 477-496. 

Kennarp, M. A. Cortical reorganization of motor 
function: Studies on series of monkeys of various 
ages from infancy to maturity. Arch. Neurol. Psy- 
chiat., Chicago, 1942, 48, 227-240. 

Laurer, M. W., Denuorr, E., & Soromons, G. Hy- 
perkinetic impulse disorder in children’s behavior 
problems. Psychosom. Med., 1957, 19, 38—49. 

Levy, S. Post-encephalitic behavior disorder—a for- 
gotten entity: A report of 100 cases. Amer. J. 
Psychiat., 1959, 115, 1062-1067. 

Lorp, E., & Woop, L. Diagnostic Values in a visuo- 
motor test. Amer, J. Orthopsychiat., 1942, 12, 414- 
428. 

SILVER, A. A. Behavioral syndrome associated with 
brain damage in children. Pediat. Clin, N. Amer., 
1958, 6, 687-698. 

Strauss, A. A. & Kepnart, N. C. Psychopathology 
and education of the brain-injured child. New 
York: Grune & Stratton, 1955. 

Srrauss, A. A., & LEHTINEN, L. E. Psychopathology 
and education of the brain-injured child. New 
York: Grune & Stratton, 1947. 

TERMAN, L. M., & Merrir, M. A. Measuring intel- 
ligence. Cambridge, Mass.: Houghton Mifflin, 1937. 


TEUBER, H.-L., & Rupet, R. G. Behaviour after cere-’ 


bral lesions in children and adults. Develpm. Med. 
child Neurol., 1962, 4, 3-20. 


(Received December 3, 1962) 


Vol. 77, No, 11 | Whole No. 574, 1963 


Psychological Monographs: General and Applied 


BRAIN INJURY IN THE PRESCHOOL CHILD: SOME 
DEVELOPMENTAL CONSIDERATIONS: 


II. COMPARISON OF BRAIN INJURED AND NORMAL CHILDREN + 


CLAIRE B. ERNHART 
Washington University 


FRANCES K. GRAHAM, PETER L. EICHMAN, JOAN M. MARSHALL 
University of Wisconsin 
anp DON THURSTON 
Washington University 


The performance of normal preschool children was compared with that of 70 

brain injured preschool children, all of whom were above IQ 50 and 55 of 
| whom were above IQ 69. For all brain injured Ss there was evidence, inde- 
pendent of the kind of behavior measured, that the brain was damaged at the 
time of testing. Brain injured children were significantly but not equally im- 
paired in all areas measured. Personality functioning was significantly less 
affected than nonpersonality. Neither the hyperkinetic personality syndrome 
nor the differential pattern of impairment seen in adults was found in this 
heterogeneous sample of brain injured children. It was suggested that there 
are systematic differences in the effects of injury depending upon age at the 


time of injury. 


Ta first section of this report discussed 
the performance of normal preschool 
children who were given a test battery de- 
signed to measure particular functions. The 
functions measured were those relevant to an 
investigationyof brain injury and included 
functions thought to be especially impervious 
to impairment as well as those thought to be 
especially vulnerable. The present section 
compares the performance of the normal chil- 
dren with brain injured children of the same 
age. This permits the question to be answered 
of whether or not brain injured children show 
impairment on these tests. In testing termi- 
nology this is a study of concurrent validity. 
Matters of greater interest involve the ques- 
_ tions whether the pattern of performance in 
brain injured children is similar to that of 


1This study was supported by Research Grant 
B-1550 from the National Institute of Neurological 
Diseases and Blindness of the National Institutes of 
Health, United States Public Health Service. Pa- 
tients were obtained through the cooperation of the 
St. Louis Children’s Hospital and the University of 
Wisconsin Children’s Hospital. We would like to 
` express our personal appreciation to the many indi- 
vidual physicians who assisted in making children 
available for study. 


normal children and whether the pattern re- 
sembles that seen in the brain injured adult. 

It has been recognized that developmental 
factors may be important in determining the’ 
effects of brain injury but the assumption has 
usually been that differences between early 
and late brain injury are primarily a matter 
of degree, of more or less impairment, rather 
than differences in kind. The brain injured 
adult is likely to show relatively great impair- 
ment in perceptual-motor and conceptual per- 
formance and relatively little impairment in 
vocabulary knowledge. This differential pat- 
tern has been used in diagnostic evaluation of 
individual patients. Despite frequent state- 
ments that such a differential pattern can be 
used to diagnose the brain injured child, there 
is little evidence that this pattern is, in fact, 
a reliable indicator of brain injury in the 
child, Most studies purporting to demonstrate 
the pattern have not been free of the criticism 
that brain injury was initially diagnosed on 
the basis of behavior that was then studied. 
It has also been suggested that young brain 
injured children exhibit a personality syn- 
drome which is virtually pathognomonic of 
brain injury, consisting of such traits as hy- 


18 ERNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


peractivity, impulsivity, and distractibility. 
Again, the evidence is inadequate. We do not 
know how frequent the syndrome is in normal 
children, whether or not it is more frequent 
in brain injured children than other person- 
ality patterns, or even how frequent it is in 
children with brain injury diagnosed from in- 
dependent criteria. 

Selection of a brain injured preschool sam- 
ple constituted the major problem. The evi- 
dence for brain injury should be restricted 
to diagnostic signs which are independent of 
the behavior to be measured and which are 
unequivocally related to central nervous 
system (CNS) lesion (Pond, 1961). Such 
evidence is not easy to obtain. With older 
children and adults, sensitive methods of de- 
tecting minor brain lesions are available. In 
the case of young children minor lesions are 
difficult to identify reliably and gross lesions 
are likely to be accompanied by mental re- 
tardation and seriously handicapping motor or 
sensory defects. If the brain injured sample is 
mentally retarded, it is difficult and perhaps 
impossible to obtain an adequate control sam- 
ple. Further, the study of retardates contri- 
butes little to understanding the child of diag- 
nostic as well as theoretical interest, that is, 
the child of normal or near normal intelligence 
who is suspected of being brain injured. 

The present study was limited, therefore, to 
children for whom there was unequivocal evi- 
dence that the brain was damaged at the time 
of testing. For approximately half of the sub- 
jects, there was evidence that damage had 
occurred at a given time due to identifiable 
cause, In addition, all children had a Stan- 
ford-Binet IQ of 50 or above and were with- 
out sensory and motor defects which would 
interfere with test performance. The decision 
to accept IQ 50 as the minimal intelligence 
level was based on Benda’s (1952) suggestion 
that an IQ of 50 represented the lower limit 
of the normal distribution of intelligence. 
Every effort was made to obtain children of 
normal intelligence and, to a considerable ex- 
tent, we were successful in this. Of the total 
brain injured sample of 70 children, 55 were 
above IQ 69 and 28 were above IQ 90. It 
required several years to locate children who 
could satisfy these criteria. 


METHOD 
Procedure 


The test procedures previously described were ad- 
ministered individually to each child.? Testing was 
usually carried out in the psychologist’s office but an 
occasional fearful child was tested in his own home, 
and bedridden children were examined in their hos- 
pital rooms. In some cases a complete test could 
not be obtained in a single session. Additional test 
sessions were employed as necessary until either the 
battery was completed or the child had demonstrated 
an incapacity to perform. Every effort was made 
to elicit a child’s best efforts and to measure cogni- 
tive and perceptual-motor abilities independently of 
possibly handicapping personality defects. 

The 1937 Stanford-Binet Intelligence Scale, Form 
L, supplemented when necessary by the Cattell In- 
fant Intelligence Scale, was also administered unless, 
as in two cases, there were available results of intel- 
ligence testing carried out by other psychologists 
subsequent to the brain injury. Eight children, tested 
in the early phases of the study, were included al- 
though no intelligence test results had been obtained. 
One of these children had a Vocabulary IQ of 61; 
the remainder had Vocabulary IQs above 90. Since 
the correlation between the Binet IQ and the Vo- 
cabulary IQ was .76 in the 62 children who took 
both tests, it was assumed that these children met 
the intelligence criterion. 


Subjects 


A sampling procedure should afford equal oppor- 
tunity for all children in the defined population to 
enter the study. In the case of brain injured sub- 
jects there is frequently a selection bias in favor of 
children exhibiting current symptoms marked enough 
to bring them to a doctor. This bias was avoided 
for approximately half of the subjects in the present 
study who were selected, not because they had cur- 
rent symptoms but because they had been exposed, 
at some time, to the possibility of injury. It was 
then determined whether or not there was evidence 
that injury had, in fact, occurred. The majority of 
children were located through the inpatient services 
(N = 26) and through the files (W = 31) of the 
St. Louis Children’s Hospital and the University of 
Wisconsin Children’s Hospital and their associated 
clinics. Twelve children were referred by interested 
physicians and one child was transferred from the 
excluded normal subjects. 

All children on the inpatient services who appeared 
likely to meet the criteria for inclusion were ex- 
amined, either at the time or subsequently. In se- 
lecting cases from the files every chart filed under 
relevant diagnoses was considered. If a child was 
then or later within the appropriate age range, if 


2The examinations were conducted by Claire B. 
Ernhart, Phyllis W. Berman, Marguerite Craft, and 
Joan M. Marshall. 


BRAIN Injury IN THE PRESCHOOL CHILD 19 


there was no evidence of severe mental retardation 
or other handicapping defect, and if the family lived 
within a reasonable distance from the research cen- 
ter, the case was selected. A letter was sent to the 
parents requesting their cooperation in a study of 
the development of children who had suffered the 
injury or disease in question. As with normal sub- 
jects, this procedure iwas modified later and families 
with telephones were called following the introduc- 
tory letter. 

The basic criterion for inclusion was that there 
should be a lesion in the cerebral hemispheres, of any 
type and due to any etiology except genetic, and 
that the evidence for such a lesion should not be 
circular, that is, should not depend on behavioral or 
psychological evidence. 

Acceptable cases were classified into four main 
groups according to etiology: Group I: etiology 
prenatal, perinatal, or unknown but probably not 
genetic; Group II: cerebral cysts or tumors; Group 
III: cerebellar cysts or tumors with increased intra- 
cranial pressure; Group IV: known encephalopathy 
associated with identified etiology. This grouping 
was used because the evidence required to infer a 
cerebral lesion differs in the four groups. 

Table 1 diagrams the minimum conditions necessary 
to qualify for inclusion in each of the groups. Where 
etiology was unknown the neurological symptoms 
themselves should be such that either a cerebral 
lesion could be inferred from them alone or, if they 
were not unequivocally indicative of lesions, there 
should be supplementary evidence from X-ray stud- 
ies, In the case of cerebral cysts or tumors neither 
neurological symptoms nor supplementary evidence 
were required beyond the operative note which 
vouched for the presence of the mass tissue. It was 
assumed that the likelihood of some damage to brain 
tissue with the occurrence and removal of any cyst 
or tumor is high, although the area affected may be 


small and may not lead to neurological residual 
symptoms. Similarly, it was assumed that some dam- 
age to cerebral tissue is also likely with the occur- 
rence and removal of a cerebellar cyst or tumor 
which is accompanied by increased intracranial pres- 
sure. Evidence of definite and current neurological 
residuals was required in the case of other etiologies, 
although the neurological symptoms did not have to 
be sufficient in themselves to infer a cerebral lesion. 
Evidence was also required that there had been dis- 
turbance of brain function at a given time, as shown 
by convulsions, disturbed level of consciousness, or 
neurological signs, and that this was associated with 
an identifiable cause, such as trauma, intoxication, 
inflammation, metabolic or vascular disorder. 
Hospital studies had been made on 66 subjects and 
included neurological examinations supplemented as 
desirable by electroencephalogram (EEG), spinal 
fluid examination, X-ray studies (skull films, pneu- 
moencephalogram, arteriogram, or ventriculogram) , 
and other laboratory procedures. Four subjects had 
not been hospitalized but had had extensive exami- 
nations, including, in three cases, electroencephalo- 
graphic and X-ray studies, in an outpatient clinic 
for cerebral palsy. If children in Etiology Group IV 
(see Table 1) had not been given a neurological ex- 
amination within 6 months of psychological testing, 
they were examined by a resident in neurology who 
was associated with the research. The medical rec- 
ords of each case were evaluated, without knowl- 
edge of psychological test results, by a neurologist 
(Eichman) to determine whether the criteria for 
brain injury had been satisfied. He also classified 
each subject according to the above scheme. The 
discharge diagnosis was usually but not necessarily 
accepted. The subjects were not classified under trau- 
matic etiology unless the trauma was sufficiently 
great to produce skull fracture, intracranial bleeding, 
or more than momentary unconsciousness, Une- 


TABLE 1 


MINIMAL CONDITIONS NECESSARY FOR INCLUSION IN THE VALIDATION SAMPLE 


Etiology 


Neurological symptoms at time of testing 


Evidence 


IA. Neurological examination 


. Prenatal, perinatal, or un- 


IA. 


Pyramidal or extrapyramidal 


known, but probably not ge- tract symptoms B. Neurological examination 
netic B. Convulsive symptoms, ex- C. X-ray evidence of , cerebral 
cept petit mal atrophy or evidence of increased 
C. Minor or other neurological intracranial pressure with cranio- 


synostosis or hydrocephalus 


ray tied II. Operati te 
II. Cerebral cyst or tumor II. None necessary . Operative no! J 
III. Cerebellar cyst or tumor III. None necessary Ill. Operative note and evidence of 
with increased intracranial increased intracranial pressure 
IV. ES eh alonathy due to: Iv. Any definite neurological IV. Neurological examination; evi- 
i A. Trauma, mechanical symptom dence of encephalopathy and 


B. Intoxication 

C. Inflammation 

D. Metabolic etiology 
E. Vascular etiology 


evidence of etiological agent 
from hospital record 


20 ERNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


quivocal neurological symptoms following trauma of 
lesser degree were classified as Etiology unknown, 
minor trauma, since it is difficult to separate cause 
and effect in these cases, especially where there is 
convulsive symptomatology. Encephalopathy due to 
inflammation was classified both as to tissue affected 
and according to the infectious agent. Specific evi- 
dence accepted for these and other etiologies is 
shown in Table 2. 

Table 2 lists the 70 subjects who met the criteria 
and shows the minimal conditions permitting inclu- 
sion. Additional information of descriptive value has 
been included parenthetically. Table 3 shows the 
major neurological residuals present at the time of 
testing. Except for Etiology Group I, the subjects 
have a corresponding position in the etiology listing 
of Table 2 and the neurological symptoms listing of 
Table 3, 

The age at which injury occurred and the time 
elapsed since injury were also recorded for each child. 
Of the children with unknown etiology, the majority 
were probably injured perinatally; they developed 
symptoms within the first 1-3 years of life. The 
first symptoms of brain tumor or cyst and the oc- 
currence of encephalopathy were distributed through- 
out the preschool age period. The time of testing 
relative to the time of injury ranged from concur- 
rent with to as long as 5 years postinjury. The mean 
time after tumor removal was 13.2 months but pre- 
sumed duration of tumor ranged from weeks to 
years, The mean time between encephalopathy and 
testing was 21.8 months. No attempt was made, in 
the present study, to determine the effect of these 


variables or of etiological classification. A larger 
sample would be necessary for such an investigation. 

Excluded cases were those who did not meet the 
criteria for the validation study. Reasons for ex- 
clusion are shown in Table 4. Some subjects were 
examined who would not have been accepted had the 
medical record been available or complete prior to 
examination. The major reason for exclusion was 
the absence of neurological residuals when the child 
was examined for research purposes. These children 
may have permanent brain tissue damage which is 
not reflected in a neurological examination, of course. 
However, it was felt that evidence based only on 
etiology and not on current functioning was not 
compelling enough evidence to warrant inclusion in 
the present study. Such cases have been reserved for 
future investigation of the question whether or not 
such etiologies show other nonneurological evidence 
of damage. 

The distribution of age, sex, and status (white pri- 
vate, white clinic, and Negro) in the accepted vali- 
dation sample is given in Table 5 and the distribution 
of intelligence in Table 6. ‘Initially, four children 
were examined before the age of 3 years. Since many 
of the procedures could not be standardized on nor- 
mal children below 3 years, the minimal age for the 
study was accordingly raised. Children who had been 
examined when they were below 3 years were re- 
examined, from 6 months to 1 year later, and the 
re-examination was substituted for the original. Two 
other children, both tested at 40 months, could not 
carry out a number of the procedures. They were 


> 


TABLE 2 


SUMMARY OF CONDITIONS ĪNCLUDED IN THE VALIDATION SAMPLE 


Etiology N Evidence* N 

I. Unknown 33 IA. Neurological examination 20 

No suspect history 12 (also cerebral atrophy diagnosed by air 

Abnormal prenatal, including prematurity 11 study 6, or presumed from asymmetrical 

Neonatal apnea 1 calvarium on X-ray 2; abnormal EEG 7) 

Birth injury 3 B. Neurological examination 9 

Minor trauma 5 (also cerebral atrophy diagnosed by air 

Possible encephalitis 1 study 2, abnormal EEG 8) 


II. Cerebral cyst or tumor 
Papilloma, choroid plexus 
Parietal-temporal astrocytoma 
Parietal-temporal ependymoma 
Intraventricular cyst 
Craniopharyngioma 
Subdural hygroma and hematoma 
Subdural hematoma and arteriovenous 
anomaly 1 


i a t i we 


C. Neurological examination and cerebral 
atrophy from air study 1 
Neurological examination and increased in- 
tracranial pressure with: 


craniosynostosis 1 
hydrocephalus 
II. Operative note 7 


(also neurological examination 7) 


aaa 


Brain INJURY IN THE PRESCHOOL CHILD 21 
Table 2—Continued 
Etiology N Evidence N 
Ill. Cerebellar cyst or tumor and increased in- III. Operative note and increased intracranial 

tracranial pressure 5 pressure 5 

Cerebellar astrocytoma i 2 (shown by papilledema 4, LP pressure > 

Cystic astrocytoma of the vermis 1 200 3, sutural diastasis 3, fontanelle tense 1) 

Cerebellar meningoencephalocele 1 

Cerebellar ependymoma if 

IV. Encephalopathy due to: Iv. 
A. Trauma, mechanical 5 A. Neurological examination 5 
(also abnormal EEG 2) 
CNS involvement following accident S 
(shown by unconsciousness 4, craniotomy 
for brain laceration 1, subdural hema- 
toma 1, skull fracture 4) 

B. Intoxication 6 , Neurological examination 6 
Lead 5 CNS involvement during illness 6 
Generalized burns 1 (shown by convulsions 3, hemiparesis 1, 

other neurological 1, disturbed level of 
consciousness 6) 

Agent—blood Pb level > .06 mg % or 
urine level > .06 mg/l 5 
(also increased intracranial pressure 4, 
heavy metal osteopathy 5) 

Agent—extensive first and second degree 
burns il 

C. Inflammation 9 . Neurological examination 9 
Meningitis, bacterial 4 (also abnormal EEG 4) 

Meningitis, viral 1 Meningeal involvement during illness 5 

Encephalitis, viral 3 (shown by pleocytosis 4, hospital diag- 

Meningoencephalitis, viral 1 nosis 1; also nuchal rigidity 5) 

‘ CNS involvement during illness 3 

(shown by positive neurological findings 
other than meningeal or general 3) 
Meningoencephalitic involvement during 
illness 1 
(shown by pleocytosis and positive neu- 
rological findings 1) 
Associated generalized infection 9 
(shown by fever 8, disturbed level of con- 
sciousness 7, convulsions 5) 
Agent bacterial 4 
Agent viral 5 
(shown by culture 3, other lab tests 5, 
clinical diagnosis of primary disease 1) 

B and C. Intoxication and inflammation 1 B and C. Neurological examination 1 
Lead intoxication and tuberculous CNS involvement during illness, as shown 
meningitis 1 by pleocytosis, nuchal rigidity, convul- 

sions, hemiparesis 1 
Agent—Urine Pb level > .06 mg/l, M. 
tuberculosis cultured et 

D. Metabolic 3 . Neurological examination and blood glu- 
Hypoglycemia, due to: cose levels < 15 mg % 3 

metabolic defect 2 (also abnormal EEG 1) 
pancreatic adenoma 1 ? 
E. Vascular 1 E. Neurological examination and carotid 
1 arteriogram 


Obstruction of left internal carotid artery 


evidence and ad R 


to totals since a single subject may be classif 


ence sufficient to permit incl 


theses. Numbers within parentheses are 


fied under more than one kind of evidence. 


usion is shown for all subjects. Supporting 
s; 


22 ErNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


TABLE 3 


Summary or NEUROLOGIC SYMPTOMS IN THE VALIDATION SAMPLE 


Etiology N 


Neurological symptoms at time of testing N 


I, Unknown 33 


II. Cerebral cyst or tumor 7 


III. Cerebellar cyst or tumor and increased in- 


tracranial pressure 5 


IV. Encephalopathy, due to 
A. Trauma, mechanical 3; 


B. Intoxication 6 


C, Inflammation 9 


B and C. Intoxication and inflammation 1 


D. Metabolic 3 


E. Vascular 1 


IA. Pyramidal or extrapyramidal tract motor 
symptoms: 
occurring alone 10 
with convulsions 10 
B. Convulsions only: 
grand mal 4 
focal 5 
C. Other 
Ataxia, facial weakness 
Lateral nystagmus 
Gait disturbance 
II. Hemiparesis 
Hemiparesis, convulsions 
Hemiparesis, ataxia 
Convulsions, focal 
Ataxia, 6th nerve palsy 
Bitemporal hemianopsia 
Febrile convulsions 
II. Unknown 
Hemiparesis 
Ankle clonus, central facial weakness 
Ataxia 
Posterior fossa signs 
IV. 
Hemiparesis 
Convulsions, grand mal 
Convulsions, focal 
None 2 
B. Unilateral hyperreflexia 
Unilateral hyperreflexia and dystonia 
Hyperreflexia and hypotonia 
Convulsions, focal 
Hemiparesis and central facial weakness 
C. Hemiplegia, convulsions 
Enlarged head, questionable eye move- 
ments 1 
Positive Babinski, foot incoordination 1 
Ataxia 1 
Unilateral ataxia, gait disturbance 1 
1 
3 


oe ee gaa re arr a aaron Creare 


Bee Ree eee 


Hemiparesis, foot drop 
Convulsions 
B and C. Hemiparesis and central facial 
weakness 1 
D. Convulsions, grand mal 1 
Convulsions, grand mal and variable dys- 
function of CNS, including ataxia 2 
E. Hemiplegia £ 


^A 3 X 8 X 3 cm. defect in the right frontal lobe resulted fı 


rom an automobile accident and was surgically removed. By the 


logic of evidence this case belongs in group II for which evidence of current neurological symptoms was not required, 


also re-examined approximately 1 year later and the 
re-examination has been used in the data analysis. 
Children with impaired motor function in both 
hands were not tested, with the exception of one 
quadraparetic child having “slight” involvement of 
the upper extremities. His scores on the Copy-Form 
and Motor-Coordination tests were not included in 


the analysis of results. In the case of children with 
hemiparesis tasks were carried out by the unaffected 
hand. Hearing and vision were judged to be ade- 
quate for taking the psychological tests if impair- 
ment had not been detected on neurological exami- 
nation and if the psychological examiner did not 
note any diffculty in this respect. There were no 


Brain INJURY IN THE PRESCHOOL CHILD 23 


TABLE 4 


SUBJECTS EXCLUDED FROM THE BRAIN INJURED 
VALIDATION SAMPLE 


TABLE 6 


DISTRĪBUTION OF STANFORD-BINET INTELLIGENCE IN 
THE BRAIN INJURED SAMPLE 


Reason for exclusion Frequency IQ N 
Low IQ, below 50 2 
Genetic or obscure etiology 3 ge) 3 
Endogenous retardation with 60-69 9 
albinismus 1 70-79 15 
Hereditary degenerative disease i ee r 
Mongolism or Down’s Syndrome 1 30729 1 
No evidence of brain lesion 5 1007309, 3 
Behavior disorder 2 MoS 2 
Cerebral palsy siblings 1 120742 2 
Microcrania 1 Aot eyen © 
Multiple congenital anomalies 1 Totaly 10 
Ned evidence of cerebral lesion 2 
taxi: nl i 
Anis ot ee ; instances of impairment of hearing in the sample, 
No known! residuals of unverified but three cases in which vision was a questionable 
adie or eect ouaciHons 2 7 factor. One child had bitemporal hemianopsia, His 
Ad ERA EINIR i scores on Picture Vocabulary and Figure-Ground 
Erythroblastosis fetalis, severe ï were probably affected by the restriction in vision 
Tean atoia on tbo since Picture Vocabulary errors were erratic. His Vo- 
cheshalopathy, i cabulary score was consequently based on Word 
Pene i lecta i Vocabulary only and his scores on Figure-Ground 
Unverified hypothyroidism i were excluded. „Two other children had bilateral 
Unverified skull fractures 3 papilledema. This does not usually cause severe loss 
Nol known residuals of known of acuity and since, in both cases, performance on 
encephalopathy 7 Picture Vocabulary was superior to that on the Binet 
Traumatic and errors were consistent ones, i.e., were confined to 
Tease don 5 the most difficult pictures, it was assumed that no 
eas mation š acuity reduction had occurred. 
Total N 26 REsULTs 


TABLE 5 


DISTRIBUTION OF AGE, SEX, AND STATUS GROUP IN THE 
BRAIN INJURED SAMPLE 


Lower limit of 6-month 


Group age interval Total V 
oms ASS 

M W Pr ae 3 een E as ty 15: 13 

MWC Grr Ite 10 

MN Se a RO- E 9 
Total N A ce 32 

FW Pr 4 2 3 4 6 19 

F W Cl 2 si See T's allied: 15 

FN eet Es BAE D E L 4 
Total N Ua E E AE 


Grand total N 17 12 13 12 16 70 


Note —M = male, F = female, W = white, N = Negro, 
Pr = private, CI = clinic. 


The performance of brain injured children 
was inferior to that of normal controls on 
most of the nonpersonality procedures, This 
was true whether the total brain injured sam- 
ple was compared with the normal standard- 
ization sample or whether comparison was 
limited to those brain injured subjects with 
intelligence above IQ 69. The differences 
were less but were generally significant even 
when adjusted for covariance with vocabu- 
lary. Impairment in personality functioning 
was not as great and depended, in part, on 
whether the parent or the examiner rated the 


child. 
Nonpersonality Measures 


Differences between the Total Brain In- 
jured and the Ni ormal Samples. Mean scores 
and standard deviations obtained by the brain 
injured sample on each of the nonpersonality 
tests are shown in Table 7. The scores were 


24 ERNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


TABLE 7 


SIGNIFICANCE OF DIFFERENCES BETWEEN THE TOTAL 
BRAIN InyuRED SAMPLE AND THE NORMAL SAMPLE 


Brain injured 


Test Na aol EEU 
Xb gb 
Vocabulary 185, 70 35.3** 14.1** 
Block-Sort 159, 67 39.4"  17.1** 
Copy-Forms 126,60 35.9** 14.7** 
Motor-Coordination 105, 32 39.8** 12.8 
Perceptual-Motor 175, 57 35.9**  14.9** 
Figure-Ground 183, 61 41.4** = 14.6** 
Localization 178, 63 40.2** = 12.3* 
Mark-Car Accuracy 184, 68 41.7** 10.2 
Mark-Car Mark 184, 68 373% 1 17.8** 
Distraction-VE 182, 67 44.2** 10.6 
Distraction-CE 182, 67 46.8 9.7 


Note.—Probabilities of mean differences were determined by 
the Cochran-Cox method of estimating ¢ ratio distributions. 
a Ns are listed in order for the normal and the brain injured 


groups. "i 
b Scores are in standard score form. The comparison normal 
group Xs equal approximately 50 with s equaling approxi- 
mately 10. 
< 


“p S01. 


adjusted for age, sex, and status as described 
in the first section of this report and were 
transformed to a common scale on which the 
mean of the standardization sample was 50 
and the standard deviation 10. Except for the 
Distraction-CE subtest, performance of the 
brain injured group was significantly inferior 
to that of normal subjects. The normal group 
with whom comparison was made consisted 
of the standardization sample and additional 
normal subjects described in the first section. 
Since most of the test variances were also sig- 
nificantly different in the two groups, the sig- 
nificance of mean differences was tested by the 
Cochran-Cox (1957, p. 101) approximation. 

Greater variability among abnormal sub- 
jects has frequently been reported and may 
have more than one explanation. Our brain 
injured group was composed of subjects who 
probably sustained injury to different parts of 
the brain, to different degrees, and for dif- 
ferent lengths of time. Consequently, the 
specific defects associated with injury might 
be expected to vary from subject to subject. 
The group performance on any one function 
would thus be more variable than normal if 
the group consisted both of individuals who 
were impaired in that particular function and 


individuals who were normal in that function, 
Brain injury may also increase variability if 
injury is associated with general characteris- 
tics, such as distractibility, which affect per- 
formance. A distractible individual, respond- 
ing to momentary changes in the environment, 
might give an inferior performance whenever 
there were minor environmental changes, no 
matter what function was being measured at 
the time. Since environmental changes could 
occur during one test period and not during 
another, the performance might also be less 
reliable. Similarly, the group performance of 
such distractible subjects would be more vari- 
able if minor environmental changes occurred 
while some individuals were being tested and 
not while others were tested. 

The effect both of specific defects, varying 
from individual to individual, and of general 
personality characteristics, affecting perform- 
ance at one time and not at another, would 
be to increase variability by extending the 
range of performance downward. Neither of 
these factors should lead to a performance 
better than normal. Graphed distributions 
of normal and brain injured group scores show 
this to be the case. In no instance was the 
increased variability in the brain injured 
group symmetrical about the mean of the nor- 
mal group. Rather, the curve was displaced 
downward with a greater proportion of brain 
injured than of normal subjects at each point 
below the mean and a small proportion of 
brain injured subjects with scores below the 
poorest normal subject. This phenomenon is 
illustrated in Figures 1 and 2, which show the 
distributions of Copy-Form and Block-Sort 
scores. As noted previously (Graham, Ernhart, 
Thurston, & Craft, 1962, p. 25), the peaked- 


Copy-Porms Score 


Fic. 1. Distributions of standard scores on the 
Copy-Forms test. 


Brain INJURY IN THE PRESCHOOL CHILD 25 


Fic. 2. Distributions of standard scores on the 
Block-Sort test. 


ness of the distributions is apparently a func- 
tion of using adjusted scores which remove 
some of the sources of variability usually seen 
in test distributions. The restricted brain in- 
jury group, also graphed in Figures 1 and 2, 
will be discussed below. This group was com- 
posed of subjects with IQs above 69 and its 
distributions did not show the slight and insig- 
nificant tendency towards bimodality present 
in the total brain injured group. 

While the performance of the brain injured 
group was significantly poorer than that of 
normal subjects on all of the main tests, the 
question remains of whether test differences 
could be ascribed to differences in intelligence, 
This question is complicated by the fact that 
intelligence itself may be affected by brain 
injury so that “holding intelligence constant,” 
when determining the effects of brain injury 
on some other function, does not give an esti- 
mate of impairment in that function “freed” 
of the effects of intelligence. The general 
problem of the meaningfulness of equating for 
a variable which is affected by the ‘“independ- 
ent” variable has been discussed by statis- 
tical writers (e.g., Lindquist, 1953; Smith, 
1957). 

Graham’ and Berman (1961) also con- 
sidered the specific problem of equating for 
intelligence when tests of brain injury were 
being standardized. In addition to the above 
difficulty, they noted that tests of intelligence 
are themselves composite measures which in- 
clude many of the special functions under 
investigation. Removing the variance asso- 
ciated with intelligence may also remove some 
of the variance which a study is measuring. 
They suggested two alternatives—equating for 
“sophistication” of background and equating 


for information or vocabulary abilities. The 
first method has been employed by Hebb 
(1949, p. 279) and the adjustments of the 
present study accomplish the same end. The 
rationale for the second method was that vo- 
cabulary, while highly correlated with intel- 
ligence, does not directly (i.e., obviously) 
measure the specific functions measured by 
perceptual-motor tests or nonverbal tests of 
conceptual ability. Any correlation between 
vocabulary and these tests would presumably 
be due to a general intelligence or g factor, 
or to nonspecific personality factors or modes 
of response. 

If the subjects were equated for vocabulary, 
it could then be determined whether or not 
there was any significant impairment in other 
abilities, beyond that which was also meas- 
ured by a vocabulary test. Equating was first 
attempted by comparing differences between 
the scores of the brain injured and normal 
groups when adjusted for covariance with vo- 
cabulary. However, the covariance method, 
which uses the average regression, was not 
applicable. As Table 8 shows, correlations of 
test score with Vocabulary score were low in 
the normal group and generally higher in the 
brain injured group. The regression coeffi- 
cients of test score on Vocabulary were also 
different in the two groups and, in most cases, 
significantly so (see Table 8). To estimate a 


TABLE 8 


CORRELATIONS BETWEEN VOCABULARY AND TEST Scores 
IN THE TOTAL BRAIN INJURED AND NORMAL GROUPS: 
F RATIOS OF THE DIFFERENCES IN REGRESSION 
COEFFICIENTS OF THE Two GROUPS 


Correlations of test 
‘and Vocabulary 2 ratios of 


t scores differences 
Sa Nona eagle Aa 
group injury 
group 
Block-Sort 24 „51 9.58" 
Copy-Forms 26 69 14.60* 
Motor-Coordination 16 48 9.51* 
Perceptual-Motor 37 68 9.35* 
Figure-Ground 5 51 9.66* 
Localization 26 49 2.50 
Mark-Car Accuracy 17 44 1,42 
Mark-Car Mark 23 57 14.04* 
Distraction-VE 25 AT 1.48 
Distraction-CE 13 .20 <1,00 


"p< 01. 


26 ERNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


regression coefficient as the average of two 
significantly different regression coefficients 
is a questionable procedure and the analysis 
was not attempted. However, since it seemed 
probable that the high correlations in the 
brain injured group were a function of its 
greater heterogeneity, analyses of a more re- 
stricted group were carried out. 

Differences between the Restricted Brain 
Injured and the Normal Samples. The re- 
stricted brain injured group included only the 
48 subjects with Binet IQs greater than 69 
and the 7 subjects who were not given the 
Stanford-Binet but whose Vocabulary IQs 
were above 90. The aim was to select a group 
whose general level of functioning was rela- 
tively homogeneous and also more similar to 
that of the normal sample. Selection was based 
on intelligence rather than vocabulary to 
avoid automatic elimination of subjects who 
might be impaired in verbal ability and supe- 
rior in nonverbal ability. The 7 subjects with- 
out Binet scores were included when examina- 
tion of the remaining cases indicated that a 
discrepancy between vocabulary and nonver- 
bal abilities, sufficiently large so that general 
intelligence would fall below 69 when Vocabu- 
lary IQ was above 90, was improbable. 

Analysis of the differences between brain 
injured and normal group scores, adjusted for 
the covariance with Vocabulary score, could 
be carried out on this restricted group. Except 
for the Mark-Car subtest, the regression co- 
efficients of test score on Vocabulary score 
did not differ significantly in the two groups. 
Table 9 gives the mean test scores in the re- 
stricted brain injury group, and the mean 
scores adjusted for covariance with vocabu- 
lary. Comparison of these means with those 
for the total brain injury group (see Table 7) 
shows, as would be expected, that the re- 
stricted brain injury group was superior to 
the total brain injury group, even before cor- 
rection in terms of vocabulary. It should be 
noted that restriction of the brain injured 
group to subjects above IQ 69 reduced the 
variability of the group, although it was still 
more variable than a normal group, and re- 
duced the range of test scores, as illustrated 
in Figures 1 and 2. F ratios for the difference 
between the adjusted means of the normal 
and the restricted brain injury groups are also 


TABLE 9 


RESTRICTED BRAIN Injury Group MEAN SCORES AND 
Mean Scores ADJUSTED FOR VOCABULARY CovaRI- 
ANCE: F Ratios OF THE DIFFERENCES IN THE 
ADJUSTED MEANS OF THE RESTRICTED BRAIN 
INJURY AND NORMAL Groups 


wo) Ad- 
Test Na Xb justed F ratio 
bay 
Vocabulary 55 39.0 — — 
Block-Sort sS 42.7 45.2 7.82** 
Copy-Forms 47 39.7 43.6 11.62** 
Motor-Coordination 30 41.2 43.9 7.45** 
Perceptual-Motor 50 381 42.8 20,52** 
Figure-Ground 51 44.2 46.2 4,59* 
Localization 54 41.7 449 10,15** 
Mark-Car Accuracy 54 43.1 45.5 6.80** 
Mark-Car Mark 54 40.6 44.0 10.03** 
Distraction-VE 54 45.2 47.7 1.72 
Distraction-CE 54 46.9 48.6 <1.00 


a Ns for the normal group are given in Table 7. | 
» Scores are in standard score form. The comparison nor- 


mal group unadjusted Xs equal approximately 50 and, after 
adjustment for vocabulary covariance, average 49.25. To facili- 


tate comparisons, adjusted Xs of the restricted brain injury 
group have been transformed to a scale on which the adjusted 


Xs gf the e girih normal group equal exactly 50, 
"p< 01. 


shown in Table 9. With the exception of the 
Distraction subtests, there was significant im- 
pairment on the tests beyond the impairment 
which was also measured by a vocabulary test. 


Personality Measures 


Differences between the Total Brain In- 
jured and Normal Samples. Mean scores and 
standard deviations obtained by the brain in- 
jured group on the Parent Questionnaire sub- 
scales are shown in Table 10. Results on the 
Examiner Ratings are shown in Table 11, As 
with the nonpersonality tests, variances tended 
to be larger in the brain injured sample and 
were significantly so on a number of the sub- 
scales. This was taken into account in evalu- 
ating mean differences. 

While mean differences between the brain 
injured and normal groups were significant on 
all but one of the Examiner Ratings, this was 
not true of the Parent Questionnaire subscales. 
Differences were generally insignificant on the 
Parent subscales describing characteristics 
thought to be typical of brain injured children 


BRAIN Injury IN THE PRESCHOOL CHILD 27 


TABLE 10 


SIGNIFICANCE OF DIFFERENCES ON THE PARENT QUES- 
TIONNAIRE BETWEEN THE TOTAL BRAIN INJURED 
AND NORMAL SAMPLES 


TABLE 11 


SIGNIFICANCE OF DIFFERENCES ON THE EXAMINER 
RATINGS BETWEEN THE ToTaL Brain INJURED 
AND NORMAL SAMPLES 


Brain injured group 
Scale Xa st 
Brain-Injury 47.6 114 
Hyperactivity 48.5 9.7 
Aggressiveness 49.7 11.5 
Emotionality 47.3 11.2 
Demandingness 48.8 9.0 
Unpredictability 46.5* 12.4* 
‘Temperateness 47.8 12.8** 
Maladjustment 43.4** 13.1** 
Inactivity 45.5** 12.3* 
Infantilism 44,5** 11.8 
Negativism 46.8* 103 
Compulsiveness 46.8* 10.2 
Fearfulness 49.6 11.2 
Inwardness 41.7** 13.5** 


Note.—Probabilities of mean differences were determined by 
Cochran:Cox method of estimating the £ ratio distribution, 
a Scores are in standard score form and are compared to a 


normal group X = 50, s = 10. The brain injured group N = 64 
and fieno group N = 206. 


p 05. 
**p <01. 


and generally significant on the subscales la- 
beled maladjustment. In this heterogeneous 
group of brain injured children, the parents 
did not describe their children as showing 
what has been called the hyperkinetic be- 
havior syndrome, although they did perceive 
them as having behavior characteristics which 
may be called undesirable. 

Differences between the Restricted Brain 
Injured Group and the Normal Sample. An 
analysis was made of differences between 
groups, on the four composite personality 
scores, when means were adjusted for the 
covariance with Vocabulary score. As with 
the nonpersonality measures, the comparison 
was carried out with the restricted rather than 
the total brain injured group although, in this 
case, correlations between personality scores 
and Vocabulary were generally low in both 
groups and differences in regression were sig- 
nificant in only one case (see Table 12). 

Table 13 gives the mean scores in the re- 
stricted brain injury group, and the mean 
scores adjusted for covariance with vocabu- 
lary. On two scales there was significant im- 


Brain injured 
group 

Rating Na xb D 
Brain-Injury 123, 63 39.6**  16.0** 
Activity 217, 70 45.4* 17.1%* 
Demandingness 217, 70 44.3** 14,5** 
Distractibility 186, 70 338.7** 13,8** 
Impulsivity 151, 63 40.4** 15,3** 

Maladjustment 199, 70 44.7%" 97 

Infantilism 217, 70 42.4** 11.1 
Negativism 199, 70 45,0** 13,1** 

Fearfulness 217, 70 49.9 10.0 


Note—Probabilities of mean differences were determined by 
Cochran-Cox method of estimating the ¢ ratio distribution, 
_ aNs for the normal and brain injured samples are listed 
in that order. 

» Scores are in standard score form and are compared to @ 


normal group X¥=50, s=10. 
“> 2 ol 


pairment in personality functioning beyond 
the impairment which was also measured by 
a vocabulary test, but the Examiner Malad- 
justment scale, which showed significant im- 
pairment in the total brain injured group, no 
longer differentiated the samples. 


Individual Differentation 


Group differences between brain injured 
and normal children were demonstrated on 
most of the tests considered. It is also of in- 


TABLE 12 


CORRELATIONS BETWEEN VOCABULARY AND PERSONAL- 
ry Scores IN THE TOTAL BRAIN INJURED AND 
Normar Groups: F RATIOS OF THE DIFFERENCES 

IN REGRESSION COEFFICIENTS OF THE 


Two Grours 
Correlations of 
scale and 
Personality Vocabulary F ratios of 
scale scores differences 
—_— _ in regres- 
Normal Brain sion 
group jury 
uD 
Parent Brain-Injury 18 14 <1 


Parent Maladjustment 
Examiner Brain-Injury 03 36 
Examiner Maladjustment 34 24 2.77 


*p< 05. 


28 ERNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


TABLE 13 


RESTRICTED BRAIN Injury Group MEAN SCORES AND 
Mean SCORES ADJUSTED FOR VOCABULARY COVARI- 
Ance: F RATIOS OF DIFFERENCES IN THE 
ADJUSTED MEANS OF THE RESTRICTED 
Brarn INJURY AND NORMAL GROUPS 


Ad- 
Personality scale Na Xb justed =F ratio 
Xp 
Parent 
Brain-Injury 181,49 491 501 <10 
Parent 
Maladjustment 181,49 448 46.1 4.10* 
Examiner 
Brain-Injury 122,49 42.3 45.0 4,99* 
Examiner 
Maladjustment 185,55 463 49.5 <10 


a Ns for the normal and brain injured samples are listed in 
that order. Normal group Ns are less than in Tables 10 and 
11 since Vocabulary scores were not standardized on 2.5-year- 
olds and these subjects could not, therefore, be included in 
the covariance analysis. 

d See Footnote b, Table 9. 

*p < 05. 


terest to know how well a test, or tests, can 
identify individual members of a brain injured 
group without misidentifying normal children. 
Tf a cutoff point were selected at two standard 
deviations below the mean, about 2.5% of 
normal children would be classified as abnor- 
mal. Table 14 shows that the percentage of 
brain injured children falling below this point 
was significantly greater than the percent- 
age of normals on seven of the eight main 
measures. 

With such a low cutoff point, no one of the 
tests selected a substantial number of brain 
injured, However, a composite index of per- 
formance would be expected to be more dis- 
criminating. Such an index was derived for 
each individual by counting the number of 
his scores falling below 1, 2, and 3 standard 
deviations on the seven discriminating meas- 
ures. Only children who obtained scores on 
at least 5 of the measures were considered. 
Distributions of the index in both the total 
and the restricted brain injured groups dif- 
fered significantly from the distribution in the 
normal group (chi square = 104.0 and 80.1, 
p< .001). The percentage of children falling 
below different cutting points is shown in 
Table 15, 

A cutoff point selected to give a minimum 


TABLE 14 


PERCENTAGE OF STANDARD SCORES BELOW 30 IN THE 
BRAIN InyurEp AND NORMAL SAMPLES 


Normal Brain 
Test group injured Chi square 
group 
Vocabulary 1.0 32.9 S7.19** 
Block-Sort 3.8 29.9 29.2*** 
Copy-Forms 8 30.0 34,70" 
Perceptual-Motor 17 26.3 33.0*** 
Parent Questionnaire 
Brain-Injury 3.3 10.9 4.5% 
Maladjustment 4.2 17.2 10,5** 
Examiner Ratings 
Brain-Injury 41 23.8 14.9*** 
Maladjustment 2.0 4.3 4 
“p< 05. 
sp < 0l. 
**" p< 001. 


of false positives, 2%, would correctly identify 
approximately half of the total brain injured 
group and 43% of the restricted brain injured 
group. A cutoff point selected to give minimal 
overlap would misidentify approximately 10% 
of the normal cases and would correctly iden- 
tify approximately three fourths of the total 
and two thirds of the restricted brain injured 
groups. Thus, brain injured children not only 
were more impaired, as a group, than normal 
children but a large number of individual 
children in the brain injured group obtained 
sufficiently extreme scores so that they could 


TABLE 15 


PERCENTAGE OF SCORES FALLING BELOW DIFFERENT 
CurTTING Ports 


Restricted 
h 3 Normal Brain brain 
Cutting Point group injured injured 


(N = 160) (N= 66) (We) 


At least one score 


< —1SD $1.9 90.9 88.7 
At least one score 

< —2 SD 10.6 72.7 66.0 
At least two scores 

< —2 SD 

or one score 

< —3 SD 1.9 53.0 43.4 
At least one score 

< —3 SD 0 34.8 26.4 


BRAIN INJURY IN THE PRESCHOOL CHILD 29 


be identified as deviant individuals. Approxi- 
mately one third of the brain injured children 
obtained scores that were worse than the poor- 
est normal child. 

The Vocabulary scale was one of the most 
discriminating of the individual tests, but the 
greatest diagnostic problem is usually pre- 
sented by children who are normal in language 
skill. Even for such children the remaining 
tests had considerable discriminating power. 
Of 22 brain injured children with vocabulary 
standard scores of 40 or above, 45% fell below 
the 2 standard deviation cutting point for the 
remaining tests. 


DISCUSSION 


The present study compared the perform- 
ance of a group of brain injured preschool 
children with the performance of normal chil- 
dren on tests which permitted statistical con- 
trol of the irrelevant variables—age, sex, and, 
more grossly, socioeconomic status. The brain 
injured group was significantly inferior on 
most of the tests. 

It is not, however, the fact of inferiority 
which is of primary interest but rather, the 
nature of functioning in the brain injured 
child. Does he function like a normal child 
but at a generally lower level, or is the pattern 
of functioning different? How does he com- 
pare with a brain injured adult? The present 
results permit some tentative answers. The 
test battery was devised to measure several 
types of broad functions—language, concep- 
tual, perceptual-motor, and personality. The 
grounds for treating these as “functions” were 
discussed by Graham, Ernhart, Thurston, and 
Craft (1962, p. 34). The argument rested 
primarily on their face validity and on an 
appeal to precedent. However, intercorrela- 
tions among tests were sufficiently low to jus- 
tify the assumption that they measured differ- 
ent things, at least in large part. No attempt 
was made to obtain statistical factors. 

To permit comparison among tests, per- 
formance was expressed in standard score 
form and estimates of reliability and distribu- 
tion shape were obtained. Three of the main 
measures—Vocabulary, Copy-Forms, and the 
Perceptual-Motor battery—did not differ sig- 
nificantly from a normal distribution, while 


the remaining measures were negatively 
skewed. However, skewness in this direction 
probably does not seriously bias comparisons 
of the degree of impairment. Negative skew- 
ness indicates a restricted spread of scores in 
the superior direction, but none of the tests 
showed the opposite phenomenon, i.e., re- 
stricted spread in the direction of poor 
performance. 

Results indicated that the pattern of func- 
tioning in this group of brain injured children 
was different from that found in normal chil- 
dren of the same age. The tests were stand- 
ardized so that the mean of the normal group 
was the same on all measures. Compared to 
the normal group, the brain injured group 
was impaired, but it was not equally impaired 
on all measures. Rather, impairment was 
marked in the cognitive and perceptual-motor 
areas and relatively slight in the area of per- 
sonality functioning. 

The significance of differences in impair- 
ment of the various functions was tested using 
the 44 brain injured children for whom there 
were complete data on the eight main meas- 
ures, Their mean scores, arranged in rank 
order, are shown in Tables 16 and 17. The 
overall F associated with differences in test 
means was significant and both Duncan’s 
multiple-range test and Fisher’s least signifi- 
cant difference method (see, e.g., Federer, 
1955) indicated that the tests fell into two 
groups—personality and nonpersonality tests. 
Whether the alpha level was set at .01 or .05, 
differences among nonpersonality tests were 
not significant. Differences among personality 
tests were also insignificant except for the two 
extreme measures which differed at the .05 
level. While tests of the nonpersonality group 
did not differ from one another and tests of 
the personality group differed only at the ex- 


TABLE 16 


DIFFERENCES AMONG MEAN Scores OF BRAIN 
INJURED SUBJECTS ON Various TESTS 


Source df MS F 
Tests 7 1121 7.51* 
Subjects 43 725 4,86* 
Error 301 149 

*p< 0l. 


30 ERNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


TABLE 17 


DIFFERENCES BETWEEN MEAN Scores OF BRAIN INJURED SUBJECTS ON VARIOUS TESTS 


Nonpersonality test means 


Personality test means 


Parent Examiner 

Perceptual- Copy-Forms Vocabu- Block-Sort Maladjust- seta Bie Maladjust- 
Motor lai ment ment 
35.8 36.0 36.4 39.2 42.3 44.8 46.3 48.6 


Note.—For Fisher’s least significant difference method 5.18 =p <.05; 6.94 = p <.01. Horizontal lines show insignificantly 


different ranges (p> .01) by Duncan’s multiple-range test. 


tremes, differences between the two groupings 
of tests were significant. Three of the four 
personality measures differed, at the .01 level, 
from three of the four nonpersonality meas- 
ures. The bordering means, Block-Sort and 
Parent Maladjustment, while not distinguish- 
able from one another, were distinguishable 
from the other three tests of the relevant 
group at either the .05 or the .01 level. Per- 
sonality tests thus formed one group, which 
showed relatively little impairment, and non- 
personality tests a different group, which 
showed relatively great impairment. 

Earlier, we tentatively ascribed a difference 
in impairment of personality and nonperson- 
ality measures to differences in the adequacy 
of the two types of measurements (Graham 
et al., 1962, p. 35). However, measures of 
personality, other than the Examiner Brain- 
Injury ratings, were as reliable as the non- 
personality measures, They would also appear 
to be a valid measure of the kind of person- 
ality characteristic under consideration, since 
it is primarily from the reports of parents and 
the observations of clinicians that a formula- 
tion of the typical brain injured personality 
picture has been derived. 

The pattern of functioning in this group 
of preschool brain injured children not only 
differed from that of uninjured children; it 
also differed from that found after injury to 
the adult brain and from that commonly as- 
sumed to typify the injured child. The com- 
mon clinical descriptions and studies of the 
brain injured child assume that there is a 
pattern of differential impairment similar to 
that found in the adult and, in addition, a 


personality syndrome which is possibly a 
unique consequence of early injury (Bakwin, 
1949; Bender, 1956; Bradley, 1955; Clements 
& Peters, 1962; Eisenberg, 1957; Laufer, 
Denhoff, & Solomons, 1957; Levy, 1959; 
Silver, 1958; Strauss & Kephart, 1955). Gra- 
ham and Berman (1961) pointed out that 
there was little satisfactory evidence for either 
of these assumptions. 

The present results suggest that the hyper- 
kinetic personality syndrome is not a typical 
picture, at least in a heterogeneous group of 
brain injured children. Hyperactivity, impul- 
sivity, and distractibility were more common 
in our brain injured group than in normal 
children. However, the difference was signifi- 
cant only for Examiner Ratings and not for 
Parent measures, it was no greater than the 
impairment in other personality characteris- 
tics, and it was less than the impairment in 
nonpersonality areas of functioning. 

These findings should be interpreted with 
the important reservation that they apply to 
injuries of heterogeneous etiology occurring 
at different ages between birth and 5.5 years. 
It is possible that the hyperkinetic person- 
ality syndrome, for example, may characterize 
young children with encephalopathies due to 
trauma, infection, and intoxication and not be 
typical of injuries suffered during birth or 
from space-occupying lesions, The present 
data are inadequate to answer this question. 

Our results also suggest that the pattern 
of differential impairment which is common 
following injury to the adult brain is not the 
pattern following early injury, The adult pat- 
tern is one of relatively great impairment of 


Brain INJURY IN THE PRESCHOOL CHILD 31 


conceptual and perceptual-motor functions 
with relatively unimpaired vocabulary per- 
formance. In the present group of brain 
injured children, there was no significant 
difference in the impairment of vocabulary, 
perceptual-motor, and conceptual abilities (see 
Table 16). It appears likely that the effects 
of injury differ with the maturity level of the 
brain at the time of injury. 

There may also be differences in effects 
when injury occurs at different ages during 
the first few years of life. In a previous study 
(Graham et al., 1962), a group of children 
who suffered perinatal anoxia was re-examined 
at 3 years with the present test battery. This 
“possibly injured” anoxic group was, of 
course, less impaired on all functions than 
the present known brain injured group. Of 
greater importance, the pattern of impairment 
differed from that of the present brain injured 
children. The anoxic children, presumably 
injured during or before birth, were signifi- 
cantly impaired in vocabulary ability but were 
not impaired in perceptual-motor ability. In 
contrast, the present brain injured group was 
equally impaired in both areas. The brain 
injured children were also a few years older, 
on the average, at the time of injury. The 
difference in the pattern of impairment might 
be due to the difference in time of injury. If 
vocabulary skill is greatly affected by in- 
jury at birth and progressively less affected 
by injuries at later ages while perceptual- 
motor ability is little affected by injury 
at birth and progressively more affected with 
age, it would be possible to obtain impaired 
vocabulary and unimpaired perceptual-motor 
abilities after birth injuries, equal impairment 
in both functions with injury at some age be- 
tween birth and maturity, and the familiar 
adult pattern of impaired perceptual-motor 
and unimpaired vocabulary abilities. 

There is some support, both from theory 
and from animal experimentation, for assum- 
ing that vocabulary impairment is inversely 
related and perceptual-motor impairment is 
directly related to the age at which injury 
occurs. The studies of Kennard (1938, 1942) 
and Benjamin and Thompson (1959) demon- 
strated that experimental lesions in young 
animals produce less motor and less sensory 
deficit than do comparable lesions in older 


animals. Hebb (1949, p. 289 ff.) suggested 
that marked impairment in language ability 
would be expected if damage occurred to 
structures before the function had an opportu- 
nity of developing, but that the same destruc- 
tion would not greatly affect these abilities 
once development had occurred. He also felt 
that this would not be true of sensory and 
motor capacities, although he did not elabo- 
rate the reasons for the distinction. Teuber 
and Rudel (1962) have suggested that, par- 
ticularly with simple functions, there may be 
comparatively larger areas of representation 
or greater diffuseness of representation in the 
young brain. Injury to a given area would be 
less crippling in this case but would become 
increasingly disabling as the function became 
localized. 

The age at which an individual is tested 
may be an additional factor of importance in 
determining whether or not impairment is 
evident. Children injured early might, for ex- 
ample, show relatively greater perceptual- 
motor deficits if examined after age 5 than 
they would have shown during the preschool 
years (cf. Fraser & Wilks, 1959; Schachter 
& Apgar, 1959; Thurston, Middelkamp, & 
Mason, 1955). The recent study by Teuber 
and Rudel (1962) is especially significant in 
illustrating the complex relationship between 
development and the effects of brain injury. 
Holding age when injured constant by study- 
ing children injured at or near birth, they 
varied the age of testing between 5 and 18 
years. The performance of both brain injured 
and normal children changed systematically 
with age but differently for different func- 
tions. The degree of impairment was thus not 
static but, rather, depending upon the normal 
and pathologic course of development of par- 
ticular functions, differences between the brain 
injured and normal groups increased, de- 
creased, or remained constant with changes in 
test age. 

Changes in the nature of what is measured 
by a particular test at various ages may €x- 
plain another difference between the present 
brain injured group and the previously stud- 
ied anoxic children. The anoxic children, un- 
like the present group, showed significantly 
greater impairment in conceptual ability than 
in vocabulary. It was pointed out (Graham 


32 ERNHART, GRAHAM, EICHMAN, MARSHALL, AND THURSTON 


et al., 1962, p. 49) that this finding would 
not be anticipated if the concepts test meas- 
ured the ability to acquire new learning after 
injury and vocabulary measured the ability 
to retain information already well learned at 
the time of injury, namely, at birth. However, 
if what is important is not only the degree 
of learning at the time of injury but the de- 
gree of learning or overlearning at the time 
the behavior is measured, the result would be 
understandable, It was felt that impairment 
of conceptual ability would continue to be 
relatively great in still older children if the 
measure of conceptual ability did not call 
upon concepts already overlearned—that is, if 
it required new learning or new organizing of 
data, This implies that a particular test of 
concept ability will not be equally sensitive 
‘to impairment at all ages, The children in the 
present study were somewhat older on the 
average than the anoxic group when tested 
and it is probable that for many of the 
older children the task required recall of 
well-learned categories rather than new or- 
ganization, 

The results of the present study must be 
considered in the light of the relatively small 
sample of brain injured children, the fact that 
impairment probably varies with the kind and 
localization of injury, and the fact that while 
there was a fairly broad sampling of behavior 
there was no effort to test pure factors or to 
investigate specific functions in detail. Allow- 
ing for these limitations, it may be concluded 
that there is little evidence for the generaliza- 
tion that adult patterns of impairment are ap- 
plicable to the young child or for the gener- 
alization that there is a unique personality 
syndrome characteristic of the young brain 
injured child. It appears more likely, from 
the present work and from the studies of Teu- 
ber and Rudel (1962), that there are sys- 
tematic differences in the effects of injury 
depending upon the age at which injury oc- 
curs and that these effects also vary depending 
upon both the particular function measured 
and the age at which it is measured. Differ- 
ences between results of the present study and 
an earlier study of anoxic children suggest 
that impairment in vocabulary may vary in- 
versely with age at the time of injury while 
perceptual-motor impairment may vary di- 


rectly. This conclusion is in keeping with the 
few relevant animal studies as well as with 
Hebb’s efforts to provide a developmental, 


theoretical framework for understanding 
brain injury effects. 
SUMMARY 


The performance of 70 brain injured pre- 
school children was compared with that of 
normal preschool children on measures of vo- 
cabulary, conceptual ability, perceptual-motor 
ability, and personality. The brain injured 
sample was restricted to children above IQ 
50 and the majority were above IQ 69, For 
all subjects there was evidence, independent 
of the kind of behavior measured, that the 
brain was damaged at the time of testing. 
Scores on the measures used were adjusted 
for the effects of the irrelevant variables, age, 
sex, and scocioeconomic status (white private, 
white clinic, and Negro), and were trans- 
formed to a scale on which the mean of the 
normal standardization sample was 50 and 
the standard deviation 10, 

1, Brain injured children were significantly 
inferior to normal children in all areas meas- 
ured and on most of the individual tests. 
Using a composite index of impairment, based 
on the counting of extreme scores, 72.7% of 
the brain injured children were correctly iden- 
tified and only 10.6% of the normal children 
were misidentified. 

2. While brain injured children were im- 
paired in all areas, they were not equally im- 
paired in all areas. Impairment was relatively 
marked on nonpersonality measures and rela- 
tively slight in the area of personality func- 
tioning and the difference was a significant 
one. 

3. The hyperkinetic personality syndrome 
was not a typical picture in this heterogeneous 
group of brain injured children, Although the 
syndrome was more common than in normal 
children, other unfavorable personality char- 
acteristics were equally common, 

4. The pattern, found after injury to the 
adult brain, of relatively great impairment in 
perceptual-motor and conceptual abilities and 
relatively little impairment in vocabulary, was 
not found in the brain injured children, The 
three areas did not differ significantly in the 


BRAIN INJURY IN THE PRESCHOOL CHILD 33 


degree of impairment nor were means in the 
predicted direction. 

5. It was suggested that there are sys- 
tematic differences in the effects of injury de- 
pending upon developmental factors. There 
may be an inverse relation between age at the 


time of injury and vocabulary impairment 
and a direct relation between age of injury 
and perceptual-motor impairment. Further, 
effects may vary depending upon the age at 
which an individual is tested and the particu- 
lar functions measured. 


REFERENCES 


Baxwry, H. Cerebral damage and behavior disorders 
in children. J. Pediat., 1949, 34, 371-381. 

Benna, C. E. Developmental disorders of mentation 
and cerebral palsies. New York: Grune & Stratton, 
1952. 

Bener, L. Psychopathology of children with or- 
ganic brain disorders. Springfield, Tl.: Charles C 
Thomas, 1956. 

Benyamiy, R. M., & THOMPSON, R. F. Differential 
effects of cortical lesions in infant and adult cats 
on roughness discrimination. Exp. Neurol., 1959, 
1, 305-321. 

Braptey, C, Organic factors in psychopathology of 
children. In P. H. Hoch & J. Zubin (Eds.), Psy- 
chopathology of childhood. New Vork: Grune & 
Stratton, 1955. Pp. 85-97. 

Ciements, S. D., & Peters, J. E. Minimal brain 
dysfunctions in the school-age child. Arch gen. 
Psychiat., 1962, 6, 185-197. 

Cocuran, W. G., & Cox, G. M. Experimental design. 
(2nd ed.) New York: Wiley, 1957. 

ErsENBERG, L. Psychiatric implications of brain dam- 
age in children. Psychiat. Quart., 1957, 31, 72-92. 

Feperer, W. T. Experimental design. New York: 
Macmillian, 1955. 

Fraser, M. S., & Wiss, J. The residual effects of 
neonatal asphyxia. J. Obstet. Gynaecol. Brit. Emp., 
1959, 66, 748-752. 

Granam, F. K., & BERMAN, P. W. Current status of 
behavior tests for brain damage in infants and pre- 
school children. Amer. J. Orthopsychiat., 1961, Bb 
713-727. 

Granam, F. K., Ernuart, C. B., Tuurston, D., & 
Crart, M. Development three years after perinatal 
anoxia and other potentially damaging newborn 
experiences. Psychol. Monogr., 1962, 76(3, Whole 
No. 522). 

Hess, D. O. The organization of behavior: A neuro- 
psychological theory. New York: Wiley, 1949. 


Kennard, M. A. Reorganization of motor function 
in the cerebral cortex of monkeys deprived of mo- 
tor and premotor areas in infancy. J. Neuro- 
physiol., 1938, 1, 477-496. 

Kennarp, M. A. Cortical reorganization of motor 
function: Studies on series of monkeys of various 
ages from infancy to maturity. Arch. Neurol. Psy- 
chiat., Chicago, 1942, 48, 227-240. 

Laurer, M. W., Dennorr, E., & SOLOMONS, G. Hy- 
perkinetic impulse disorder in children’s behavior 
problems. Psychosom. Med., 1957, 19, 38-49. 

Levy, S. Post-encephalitic behavior disorder—a for- 
gotten entity: A report of 100 cases. Amer J. 
Psychiat., 1959, 115, 1062-1067. 

Lrnvouist, E. F. Design and analysis of experiments 
in psychology and education. Boston: Houghton 
Mifflin, 1953. 

Ponp, D. A. Psychiatric aspects of brain-damaged 
children. Brit. med. J., 1961, 2, 1454-1459. 

ScuacuTer, F. F., & APGAR, V. Perinatal asphyxia 
and psychological signs of brain damage in child- 
hood. Pediat., 1959, 24, 1016-1025. 

Sver, A. A. Behavioral syndrome associated with 
brain damage in children. Pediat. Clin, N. Amer., 
1958, 6, 687-698. 

Smırn, H. F. Interpretation of adjusted treatment 
means and regressions in analysis of covariance, 
Biometrics, 1957, 13, 282-308. 

Strauss, A. A., & KermarT, N. C. Psychopathology 
and education of the brain-injured child. New 
York: Grune & Stratton, 1955. 

Teuser, H.-L., & RUDEL, R. G. Behaviour after cere- 
bral lesions in children and adults. Develpm. Med. 
Child Neurol., 1962, 4, 3-20. 

Tuurston, D. L., MIDDELKAMP, J. N. & Mason E. 
The late effects of lead poisoning. J. Pediat., 1955, 
47, 413-423. 


(Received December 3, 1963) 


Vol. 77, No. 12 


Whole No. 575, 1963 


Psychological Monographs: General and Applied 


paaa 


TEMPORAL NUMEROSITY AND THE PSYCHOLOGICAL 
UNIT OF DURATION * 


CARROLL T. WHITE "7 


United States Navy Electronics Laboratory, San Diego, California 


Functions relating the perceived number of flashes to the number of flashes 
presented were obtained for stimulus presentation rates varying from 10 to 


30 flashes per sec. These relationships, called temporal numerosity functions, 


provide information regarding the rate of increase in the perceived number 

and indicate the existence of critical points in time following, the onset of 

stimulation. It is assumed that these findings describe the cuties character- e 
istics of a basic central process underlying perception. The results of these 

studies, and similar studies with other sense modalities, are considered in 

relation to the concept of the psychological unit of duration, or moment, and 

also in relation to certain neurophysiological processes whose temporal char- a 
acteristics are markedly similar to those exhibited by the perceptual results, * 


HE concept of time has always held 

an important place in philosophy and 
science, Of particular interest, for present 
purposes, are the questions which have arisen 
concerning the perception of time and the role 
of time in perception. 

One very persistent idea has been that there 
are natural units of duration, which are a func- 
tion of the organism itself. William James 
discussed this at length in The Principles of 
Psychology (James, 1890, pp. 605-642), re- 
ferring to it as “the law of time’s discrete 
flow.” It is interesting to note, in view of later 
developments, that he invoked the concept of 
“waxing and waning brain processes” to ac- 
count for this discontinuity. He went so far 
as to suggest that without such processes we 
could not have a conception of time at all. 


1 Adapted from a dissertation submitted to the 
Department of Psychology, Brown University, in 
partial fulfillment of the requirements for the doctor 
of philosophy degree. The writer is greatly indebted 
to H. Schlosberg, professor, for the aid and encour- 
agement given during the course of this work, He 
would also like to acknowledge the invaluable as- 
sistance provided by E. H. Kemp, M. Lichtenstein, 
and J. A. Hoke, all of the United States Navy 
Electronics Laboratory. 

This manuscript is submitted with the understand- 
ing that a right of reproduction for governmental 
purposes is reserved for the United States Navy 
Electronics Laboratory. The opinions and assertions 
contained herein are the private ones of the writer 
and are not to be construed as official, or as reflecting 
the views of the Navy Department or the naval 


service at large. 


James discussed the possibility that différ- 
ent organisms may have entirely different 
duration units, and that under certain condi- 
tions these units might vary within a given or- 
ganism. Quoting from Von Baer, he then 
speculated about the perceptual experiences of 
hypothetical beings whose units of duration 
are vastly different from ours: 

Suppose we were able, within the length of a 
second, to note 10,000 events distinctly, instead of 
barely 10, as now; if our life were then destined 
to hold the same number of impressions, it might 
be 1000 times as short. We should live less than a 
month, and personally know nothing of the change of 
the seasons. . . . Now reverse the hypothesis, and 
suppose a being to get only 1000th part of the 
sensations that we get in a given time, and conse- 
quently to live 1000 times as long, Winters and 
summers will be to him like quarters of an hour. 
Mushrooms and the swifter-growing plants will shoot 
into being so rapidly as to appear instantaneous 
creations; annual shrubs will rise and fall from the 
earth like restlessly boiling-water springs; the mo- 
tions of animals will be as invisible as are to us the 
movements of bullets and cannon-balls; the sun 
will scour through the sky like a meteor leaving a 
fiery trail behind him, etc. [p. 639]. 

He then described the time distortion pro- 
duced by certain drugs, such as hashish, and 
suggested that this might conceivably repre- 
sent an approach to the condition of the short- 
lived beings described above; that is, these 
drugs might act to shorten the duration unit. 
If the subjective length of each unit were un- 
changed, then a person would perceive ex- 
ternal events as in “slow motion.” 


2 CARROLL T. WHITE 


Henri Bergson is perhaps best known for 
his writings on the problems involved in the 
concept of time. His concept of time in terms 
of natural units of duration (or “moments,” 
as he sometimes called them) is now often 
referred to as Bergsonian time to distinguish 
it from Newtonian time, in which time is con- 
ceived of as a smooth-flowing continuum. 

The development of motion pictures pro- 
vided Bergson with a model by means of 
which he could express his ideas in more con- 
crete terms. In his book Creative Evolution, 
first published in 1908, he included a section 
entitled “The Cinematographical Mechanism 
of Thought, and the Mechanistic Illusion” 
(Bergson, 1913, pp. 272-314). At one point 
in this chapter he discussed the difficulty which 
would be involved in trying to re-create a “liv- 
ing picture,” such as that of a regiment march- 
ing past. After considering various techniques, 
all very impractical and difficult, he pointed 
out that there is a way of doing it: ~ 


It is to take a series of snapshots of the passing 
regiment and to throw these instantaneous views on 
the screen, so that they replace each other very 
rapidly. This is what the cinematograph does. With 
photographs, each of which represents the regiment 
in a fixed attitude, it reconstitutes the mobility of 
the marching regiment. . . . Such is the contrivance 
of the cinematograph. And such is also that of our 
knowledge. Instead of attaching ourselves to the 
inner becoming of things, we place ourselves outside 
them in order to recompose their becoming artifi- 
ciallye We take snapshots, as it were, of the passing 
reality, and, as these are characteristic of the 
reality, we have only to string them on a becoming, 
abstract, uniform and invisible, situated at the back 
of the apparatus of knowledge, in order to imitate 
what there is that is characteristic in the becoming 
itself. Perception, intellection, language so proceed 
in general. Whether we would think becoming, or 
express it, or even perceive it, we hardly do any- 
thing else than set a kind of cinematograph going 
inside us. We may therefore sum up what we have 
been saying in the conclusion that the mechanism of 
our ordinary knowledge is of a cinematographical 
kind . . . of the altogether practical character of this 
operation there is no possible doubt. . . . The cine- 
matographical method is the only practical method. 
. . . Action is discontinuous, like every pulsation of 
life; discontinuous, therefore, is knowledge. The mech- 
anism of the faculty of knowing has been constructed 
on this plan [pp. 304-307]. 


The concept of natural units of duration 
was an integral part of the theories concern- 
ing. physiological pacemakers. The reality of 

? 


such temporal units was accepted as a fact; 
the question was by what process they came 
about. Hoagland (1935) wrote: 


A more likely basis for a subjective time scale 
would appear to be not an overt form of highly 
variable motor activity, such as the pulse or respira- 
tory movements, but rather a chemical mechanism, 
perhaps associated with the continuous respiration 
of cortical nervous tissue. With this hypothesis in 
mind, experiments were made to determine the effect 
of internal body temperatures on judgments of short 
durations. 

The idea of making these experiments came essen- 
tially as a product of adversity. It resulted from 
the fact that my wife, having fallen ill with influenza, 
was impressed with the fact that time seemed to pass 
very slowly. It occurred to me that this might be 
due to an elevated temperature from fever since, if 
some form of chemical reaction acted as a timing 
mechanism, the increased body temperature should 
make the “chemical clock” run faster. This should 
make time, as judged by objective clocks, appear to 
go by more slowly, since more psycho-physiological 
“time” would pass in a given constant unit (a 
minute, let us say); of our conventional constant 
time scale standard [p. 107]. 


Hoagland then went on to describe how he 
had his wife estimate the duration of a minute 
(by counting to 60 to herself at a rate of 
what she believed to be 1 per second) at 
various stages of her illness. He found that 
the actual duration of the count varied in- 
versely with her temperature, ranging from 
52 seconds at 97.4° F. to 37.5 seconds at 
103° F. This same effect was found for 
another person with influenza, and with a sub- 
ject whose internal temperature was artifi- 
cially raised by means of diathermy. He 
concluded: 

Since physiological time passes more rapidly at 
higher temperatures, while physical time continues 
at a constant rate, we should expect time to appear 
to pass more slowly during a fever than when at 
rest. This appears to be a common experience. . . . 
Psychological time thus seems to depend directly upon 
the velocities of certain definite, chemical processes, 
the psychological and physiological events forming 
different aspects of the same thing. Measurements 
of the estimations of short durations as a function 
of temperature indicate the existence of a master 
chemical reaction, possibly the slowest of the series 
of essentially irreversible processes involved in the 
respiration of certain parts of the brain [p. 116]. 


It is of interest to note that a few years 
prior to Hoagland’s studies, Francois (1927), 
working under Piéron in France, had also 
studied the time estimation of subjects whose 


TEMPORAL NUMEROSITY 3 


temperature was varied by means of dia- 
thermy. His results were essentially the same 
as Hoagland’s. The subjects were trained to 
tap a telegraph key at a constant rate of once 
per second. When they were then subjected 
to diathermy, it was found that their rate of 
tapping varied directly with their internal 
temperature. 

In the preface to his book The Sensations: 
Their Functions, Processes and Mechanisms, 
Piéron (1952) points out that the main 
emphasis of the work done by himself and his 
co-workers has been to determine the relation- 
ship of time to sensation. In the section of 
that book entitled “Temporal Aspects of 
Sensation” (pp. 247-309), he discusses the 
results of this work, as well as the results 
obtained by others working in this area, in 
considerable detail. Of particular interest in 
respect to our present problem is his con- 
clusion: 

Units of time exist... . The time units we use 
in our actions are functions of the speed of deep 
seated organic processes conditioning metabolism and 
various vital activities, such as the propagation of 
the nervous discharge. There is a time appropriate 
to the organism, of which the units are a function 
of the speed of the biological processes. Now this 
speed is essentially controlled by the temperature 
level, on which depends the chemical activity form- 
ing the sub-stratum of life [p. 293]. 


From this we can see that Piéron is in 
basic agreement with Hoagland. He also 
anticipates later ideas on this subject by point- 
ing out that the alpha rhythm and beta 
rhythm vary with internal temperature. 

In a footnote to the passage quoted above, 
Piéron mentions the fact that we have ways of 
correlating to our individual time certain 
phenomena which evolve so rapidly or slowly 
that we cannot observe them directly. This 
is by means of time-lapse and high-speed 
photography. More general terms for these 
techniques which are now coming into use 
are “time compression” and “time expansion,” 
terms which encompass other means of record- 
ing and display in addition to photography. 
New techniques, such as magnetic tape record- 
ing, can also, of course, be used for presenting 
stimuli to sense modalities other than vision. 
With such techniques we are, in a sense, 
simulating the perceptual experiences of the 


short-lived and long-lived beings hypothesized 
by James. 

The concept of the natural unit of duration 
is also to be found in cybernetics theory, but 
in this case the emphasis is somewhat differ- 
ent. Wiener (1948) concluded that in the 
operation of the brain some sort of scanning 
mechanism would be necessary: 


The scanning apparatus should have a certain 
intrinsic period of operation which should be identi- 
fiable in the performance of the brain. . . . While 
this cyclical process then might be a locally deter- 
mined one, there is evidence that there is a wide- 
spread synchronism insdifferent parts of the cortex, 
suggesting that it is driven from some clocking 
center. In fact, it has the order of frequency appro- 
priate for the alpha rhythm of the brain, as shown 
in electroencephalograms. We may suspect that this 
alpha rhythm is associated with form perception, 
and that it partakes of the nature of a sweep 
rhythm, like the rhythm shown in the scanning 
process of a television apparatus. It disappears in 
deep sleep, and seems to be obscured and overlaid 
with other rhythms, precisely as we might expect, 
when we are actually looking at something, and the 
sweep rhythm is acting as something like a carrier 
for other rhythms and activities [pp. 165-166]. 


Our unit of duration is thus viewed, in 
Wiener’s approach to the matter, as the period 
of a hypothetical scanning process. Stroud 
(1948), writing at about the same time, took 
a similar approach. He christened the dura- 
tion unit the “moment,” and it seems that this 
name might become accepted since most other 
workers in this area since that time have 
tended to use it. Earlier writers, such as 
Bergson, had also used this term in referring 
to the psychological unit of duration, but used 
other terms as well, so the naming of the con- 
cept can rightly be attributed to Stroud. Most 
of his earlier work was not published in easily 
obtainable sources, but more recently a rather 
thorough statement of his idea has appeared 
(Stroud, 1955). Some of the properties of 
Stroud’s moment are as follows: 

Physical time t is represented in the experience of 
man as psychological time T. ... T is not a con- 
tinuous variable. . . . [This] can be interpreted as 
a result of a scanning process. Each scanning proc- 
ess—be it one of integration or one of sampling— 
or of any other kind, reduces the number of dimen- 
sions of representation. It loses all information con- 
tained in the order of the dimension eliminated by 
the scan over the period of the scan. The only order 
information left in this dimension lies in the order 


4 CARROLL T. WHITE 


of the elements of the series of representations pro- 
duced by the scanning process. 

The properties of moments, so far defined, could 
not contain movement as a property of the content 
of the moment, for movement is change with respect 
to time. . . . Psychological movement (apparent 
movement) ... is in the nature of an inference 
based upon differences between moments. 

Seen physical events, which differ from one an- 
other only by the dates of their occurrence, are 
differentiated. in psychological time at a maximum 
rate of one per moment [pp. 177-181]. 


On the basis of various findings reported in 
the literature, Stroud concludes that the 
period of the moment can range from about 
50 to 200 milliseconds, with the most fre- 
quent value being around 100 milliseconds. 

The scanning process hypothesized by 
Wiener and Stroud is directly related to the 
earlier theoretical work of Pitts and Mc- 
Culloch (1947), and their “scansion” concept. 
Although Stroud is careful not to attempt to 
relate his moment to any particular physio- 
logical process, but treats it as a model of 
perceptual behavior, other workers in this 
area have openly suggested that it is in some 
way related to the process underlying the 
alpha rhythm (Murphree, 1954; ‘Walter, 
1950, 1959; Wiener, 1948). 

The most explicit statement of this latter 
view is that made by Walter (1950). He 
argues that the principle of parsimony would 
suggest a scanning mechanism of some sort. 
He draws an analogy to a television system, 
in which the scanning process makes it possi- 
ble to transmit over a single radio channel 
information which otherwise would require 
over 100,000 channels. But, as he points out, 
a price must be paid for this parsimony. First, 
reception speed would be limited to the scan- 
ning frequency (or the frame frequency in 
the television analogue); changes in space 
which occur more frequently than this or 
movements which cover an appreciable dis- 
tance in the time of one scan period would lead 
to blurring and distortion of the pattern. Sec- 
ond, intermittent signals which are shorter 
than the duration of the scan but recur about 
its frequency would give the illusion of move- 
ment. 

Walter (1950) then poses the question as 
to whether there is any process in the brain 
having at least the temporal characteristics 


of the hypothetical scanning process he has 
been describing, and concludes: 

We have the alpha rhythm of the EEG....1 
was anxious to show that the existence of some such 
rhythm could be predicted without recourse to 
electronic gadgets from facts which have been known 
for over a generation. . . . The alpha rhythm, un- 
expected, unwanted and disinherited by physiologists, 
is now familiar enough. . . . That its name was be- 
stowed by a psychiatrist and not by a physiologist 
may, of course, be a handicap, but let us see whether 
we can find in this phenomenon at least some ca- 
pacity for doing useful work [p. 10]. 


One of the most recent discussions of the 
“discontinuity hypothesis” is that by Mc- 
Reynolds (1953). He says, in part: 

We assume, then, a succession of neural discharge 
patterns as underlying transient mental functions. 
I will refer to these successive patterns as moments 
(Ms). This is after the work of Stroud . . . though 
my usage of the word is slightly different from his. 
The phenomenal counterpart of a M may be termed 
a percept, and moment sequences will be used to refer 
to any number of M’s which occur successively 
[p. 320]. 


McReynolds points out that the discon- 
tinuity hypothesis is by no means a new one 
and refers to a number of the items which 
have been discussed in the preceding para- 
graphs. He also describes a number of empiri- 
cal findings from the literature, including both 
perceptual and motor phenomena, which seem 
to indicate a periodicity such as he hypothe- 
sized. He feels that the alpha rhythm is sug- 
gestive in this regard, as is the Bartley effect 
(the brightness enhancement of a light flick- 
ering at the rate of around 10 per second), 
but indicates that none of the earlier studies 
can really be used as a test of the hypothesis 
since none were specifically carried out for 
this purpose, and were done under such dif- 
ferent conditions that their results are diffi- 
cult to compare. He does feel, however, that 
the aggregate of the bits of information from 
the earlier studies he cites certainly justifies 
“straightforward experiments crucial to the 
major hypothesis.” 

On reviewing the ideas which have been 
put forth regarding the concept of a psycho- 
logical unit of duration, perhaps the most 
striking thing to note is that although the 
various writers have arrived at essentially the 
same conclusion, they have been compelled 


TEMPORAL NUMEROSITY 5 


to do so by quite different reasons. The phi- 
losophers believed that some kind of time 
sampling was necessary in order for one to 
detect change, or even to have an idea of 
time itself; the physiologists, as exemplified 
by the pacemaker theorists, believed that 
all vital processes are cyclic in nature, gov- 
erned by basic metabolic factors; the cyber- 
neticists, considering the brain as a computer, 
concluded that it must operate on a scanning 
principle in order to handle the volume of 
information it does. The fact that these dif- 
ferent approaches have led to the same con- 
clusion would indicate that such a conclusion 
deserves serious consideration. 


EXPERIMENTAL BACKGROUND 


Two definite predictions could be made 
regarding temporal efficiency (or perhaps 
acuity) in perception if there were indeed 
a periodicity in the perceptual process: 
(a) Temporally discrete events occurring dur- 
ing any one unit of duration should be per- 
ceived as simultaneous; and (b) there should 
be a definite limit to the perceived rate of 
stimulation. It can be seen that these are 
merely two different ways of expressing the 
same thing, one in terms of the period and 
the other in terms of the rate of the hypothe- 
sized process. The general consensus of those 
writing on this topic is that the duration 
unit is equal to about 100 milliseconds, but, 
as McReynolds points out, this estimate has 
been based on incidental observations and 
on bits of information from various studies, 
none of which had been designed to test the 
hypothesis in question. 

There are two types of experiments which 
seem to be well suited to test the predictions 
of the discontinuity hypothesis. These may 
be referred to as (a) tests of perceptual simul- 
taneity, and (b) tests of perceptual rate. The 
background, rationale, and results to date of 
these types of studies will now be discussed. 


Tests of Perceptual Simultaneity 


This general category of test is one of the 
oldest in experimental psychology. A good 
part of the work done in Wundt’s laboratory, 


for example, was in regard to this problem 
(Boring, 1942). Piéron (1952) states: 

The range over which the apparent simultaneity 
of disparate impressions can fluctuate is a little more 
than a tenth of a second for the comparison of visual 
and auditory sensations, according to Michotte’s 
determinations (1912). It is of the same order of 
size between sight and touch, and a little less between 
touch and hearing [p. 296]. 


The results of these early studies cannot 
be used directly in determining the zone of 
simultaneity since the exact values of the 
various parameters were not specified accu- 
rately. This is especially critical in the inter- 
sensory studies, such as Michotte’s because 
here we have the added complication of dif- 
ferences in latency. However, they do serve 
to indicate the approximate values we might 
expect in more rigorously controlled studies, 

A type of study which seems to come closer 
to meeting the requirements for our purpose 
is that done by Stein in 1928, and reported 
by Woodworth (1938, p. 689). By means 
of a specially designed tachistoscope, Stein 
could expose the letters of a word in rapid 
sequence. He found that if all the letters of 
a word were exposed within a total time of 
100 milliseconds that word was perceived just 
the same as when all the letters were exposed 
simultaneously. He could even expose the 
letters in reverse order without the subject 
being aware that any change had been made. 
Woodworth claimed that these surprising re- 
sults “could be explained by the known facts 
of retinal lag.” In other words, Woodworth be- 
lieved that a peripheral, rather than a central, 
process was involved. 

An improvement over Stein’s technique 
would be one in which segments of a com- 
posite form (such as letters of a word), or a 
series of different forms can be presented in 
sequence, it being possible to repeat this se- 
quence continuously at any desired rate. By 
means of this technique it is possible to avoid 
any transient effects which may accompany 
the onset of visual stimulation and which 
could conceivably affect the perception of the 
stimuli which are presented. With an appara- 
tus designed to present stimuli in the manner 
described, an observer could, for example, 
establish a threshold of simultaneity by ad- 
justing the rate of stimulus presentation until 


6 Carrott T. WHITE 


the point is reached where all the segments 
of a composite form seem to be appearing at 
the same time. 

Murphree (1954), using a technique such 
as just described, carried out a series of 
studies which were designed specifically to 
test the cyberneticists’ hypothesis that the 
neural processes underlying the alpha rhythm 
are the basis for the loss in temporal acuity 
in rapid successive form perception—thus 
being related to their perceptual “scanning” 
mechanism. Murphree pointed out that a 
number of testable predictions could be made 
on the basis of such a hypothesis, such as: 
(a) that separate form images may not be 
perceived at a rate faster than the alpha 
frequency for a particular individual; 
(b) that the rate of form presentation at 
which apparent motion ceases (threshold of 
simultaneity) will be proportionate to the 
alpha frequency; and (c) that segments of a 
composite form, when presented sequentially 
at or above alpha frequency, will be seen as 
fused into that composite form with all ele- 
ments simultaneously present. 

In his experiments Murphree used 50 sub- 
jects, patients in a veterans’ hospital, whose 
basic alpha frequencies ranged from 8 to 13 
cps. These subjects were chosen on the basis 
of the results of a standard electroencephalo- 
gram (EEG) examination given to all pa- 
tients. The experimenter himself, however, 
did not know the basic alpha frequency for 
any particular subject until the studies were 
completed. 

On the basis of the results of a number of 
experiments, each designed to test one of the 
predictions, Murphree concluded that “there 
is a significant relation between the alpha 
rhythm and the temporal aspect of rapid suc- 
cessive spatial perceptions.” For example, he 
found that the mean “span of simultaneity” 
for all 50 subjects was 95 milliseconds, while 
the mean span (period) of their alpha cycle 
was 98 milliseconds. 

At this point it might be well to make it 
quite clear that the basic hypotheses regard- 
ing the psychological unit of duration (or 
moment) do not specify the process by which 
it comes about. Having deduced what the 
probable temporal characteristics of such a 


duration unit might be, many writers have 
noted that the alpha rhythm has such char- 
acteristics. Subsequent investigations may 
show that this correspondence is purely for- 
tuitous, in spite of suggestive findings such 
as those of Murphree. For this reason it is 
advisable to keep the alpha rhythm out of 
the primary hypothesis. Even if future re- 
search were to show that changes in an indi- 
vidual’s alpha frequency, brought about by 
changes in internal temperature or by the 
administration of certain drugs, corresponded 
to changes in his span of simultaneity or some 
similar measure, this still would not prove 
that a causal relationship existed. 

Further information regarding perceptual 
simultaneity has been obtained in a study by 
Lichtenstein (1961). The technique of stimu- 
lus presentation used in this study was simi- 
lar in principle to that used by Murphree, in 
that a composite form was broken down into 
segments, these segments then being presented 
sequentially at a variable rate. The rate of 
presentation was controlled by the observers. 
The composite form in this case was a simple 
geometrical dot pattern consisting of four 
back-illuminated points of light. Each point 
of light appeared alone in sequence. With 
each such pattern the observers were in- 
structed to vary the rate of presentation until 
that point was reached when all the dots in 
the pattern appeared to be flashing simultane- 
ously. The observers were given unlimited 
time to make each setting and were instructed 
to bracket the critical region by increasing 
and decreasing the presentation rate before 
making each setting. 

Lichtenstein found that large variations in 
the temporal spacing of the dots within the 
pattern cycle did not change the cycle dura- 
tion at the threshold of simultaneity. He 
tried various temporal patterns, from regu- 
lar spacing (one-fourth cycle between dots) 
to an extreme situation wherein five eighths of 
the total cycle was interposed between two 
successive dots. 

There is another variation of the perceptual 
simultaneity type of experiment which has 
been tried which should prove to be useful in a 
number of ways. This could be called the syn- 
chronization technique. In this type of study 


TEMPORAL NUMEROSITY 7 


a well-practiced subject attempts to synchro- 
nize his finger-tapping with a repetitive stim- 
ulus which is presented to him. The optimum 
stimulus rate has not yet been determined, 
but two per second seems to be quite satis- 
factory. It is, of course, a very simple matter 
to record the onset of each stimulus along 
with the tapping responses. By recording the 
responses only at those times the subject in- 
dicates that he is synchronized with the stimu- 
lus, a distribution of the time of his taps rela- 
tive to the actual onset of the stimuli can 
readily be obtained. The variability of this 
distribution might provide an estimate of the 
subject’s span of simultaneity. This seems 
to follow from the fact that the distribution 
is based on responses which the subject re- 
ported as synchronized with the stimulus. 
The most meaningful measures of variability 
for this purpose must be determined. It is felt 
that a measure such as the range of devia- 
tions which includes the middle 90% of the 
responses might be suitable. 

The synchronization technique was used by 
Lichtenstein and White (1961) in an attempt 
to determine the relative latency of response 
to a light stimulus presented in various parts 
of the visual field. A neon bulb flashing at 
the rate of two per second was presented at 
49 positions in the visual field of the right 
eye for each subject: the central fovea and 
six distances out from the fovea, up to 40°, 
along each of the eight major meridians. It 
was found that the constant error of the re- 
sponse distributions tended to increase as the 
stimulus was moved out from the fovea. These 
results correspond to findings regarding the 
relative latency of response to stimuli pre- 
sented at various positions in the visual field 
which are obtained when the more conven- 
tional reaction-time technique is employed. 
(The relative latency was determined for any 
given retinal locus of stimulation by subtract- 
ing the constant error of the distribution of 
taps for foveal stimulation from the constant 
error for that locus of stimulation.) The vari- 
ability was essentially the same for all 49 


2Tt can be seen that the technique described is re- 
lated to that used by Sweet (1953) when he studied 
relative latencies for various retinal locations. The 
results obtained are also similar to his findings. 


positions; the mean standard deviation being 
about 40 milliseconds for both of the subjects 
tested. ; 


Tests of Perceptual Rate 


It has long been known that when one ob- 
serves a flickering light source the apparent 
rate of flicker is not necessarily equal to the 
actual rate. In 1937 LeGrand reported, 
le rythme apparent du papillotement semble inférieur 


à la fréquence réelle, et d'autant plus que la source 
est observée plus latéralement. 


He then pointed out that, for example, a light 
of a certain intensity flickering at the rate 
of 42 per second appears to be flickering at 
the rate of around 7 per second when one 
fixates a spot 15° to one side of it, the back- 
ground being dimly illuminated. If the fixa- 
tion point is moved farther from the light 
source the apparent rate of flicker decreases 
even more (LeGrand, 1937). 

Bartley (1951) was very much interested 
in this phenomenon and did a considerable 
amount of work on it. He attempted to quan- 
tify the apparent rate of flicker by matching 
it with a second flickering source. As a re- 
sult of these studies he decided that this 
“residual flicker,” as he called it, remained 
at a steady rate of about 20 per second, re- 
gardless of actual flash rate or flicker fusion 
value. He believed that this limiting of per- 
ceptual rate was brought about by the “intrin- 
sic discharge characteristic of the retinal gan- 
glion cells.” 

Piéron (1952, pp. 307-308) indicated that 
he believed this phenomenon might be an 
important clue to the nature of certain basic 
perceptual processes. He was critical of Bart- 
ley’s attempts to quantify the apparent rate 
of flicker, pointing out that “the equalization 
of two disparate stimulations does not repre- 
sent in the strict sense an evaluation.” This 
criticism seems to be entirely justified, since 
it is hard to believe that a process which is 
affecting the perception of one of a pair of 
flickering lights would not also be affecting 
the other. Piéron concluded his discussion on 
this topic by stating: 

One is led to think that the evaluated rhythm is 


that of perceptive reactions synchronizing themselves 
with some of the intermittent stimulations, 


. 


8 Carrot T. Warre 


The problem of the mechanism involved is difñcult 
because of the uncertainty of the numerical evalua- 
tions which are intellectualized, whilst resort is not 
made to the artificial method of the verbal numera- 
tion of the distinguishable stimuli, a procedure in- 
applicable at frequencies greater than the extreme 
limit of 10 per second [p. 308]. 


In spite of Piéron’s pessimism, a number 
of studies have been carried out involving the 
“verbal numeration of the distinguishable 
stimuli” at rates righer than 10 per second. 
These studies have been, however, of a type 
different than Piéron envisioned. Instead of 
having the observer attempt to count the per- 
ceived stimuli produced by an intermittent 
source presented continuously, in these studies 
he is presented short trains of successive 
stimuli, at various rates, and asked to report 
the number of stimuli perceived in each such 
stimulus train. 

The first study of this kind was carried out 
by Cheatham and White (1952). Trains of 
visual stimuli consisting of various numbers 
of flashes recurring at rates of 10, 15, 22.5 
and 30 per second were presented to their 
subjects, who reported the number of flashes 
perceived following the presentation of each 
such stimulus train. The duration of each 
flash was always 10 milliseconds, regardless 
of the flash rate used. The stimulus was a 
circular spot .6 centimeter in diameter, viewed 
from a distance of 80 centimeters, whose 
luminance was 1,400 ft-L. This stimulus was 
viewed foveally against a background of 1.8 
ft-L. The results obtained indicated that 
the number of flashes reported by the sub- 
jects depended primarily on the time it took 
to present a stimulus sequence and not on 
the number of stimuli in that sequence. Per- 
haps the most impressive aspect of the data 
was the extreme reliability of the responses 
to certain number-rate sequences. For ex- 
ample, all the subjects always reported hav- 
ing seen 2 flashes whenever 5 flashes at 30 
per second were presented to them, and they 
always reported having seen 3 flashes when- 
ever 10 flashes at 30 per second were pre- 
sented. When the perceived number of flashes 
was plotted as a function of time it was noted 
that a distinct plateau in the function occurred 
at a point about 300 milliseconds from the 
time of onset. When the function once again 


began to rise, the slope was definitely less 
than at the beginning. The overall slope of 
the function was found to correspond to a rate 
of about 6-7 per second. This was interpreted 
to be the maximum “perceptual rate” for the 
conditions of the experiment. Further investi- 
gation showed that the intensity of the flash- 
ing light could be varied over a wide range 
without affecting the perceptual results in any 
way. 

The results of this study convinced the 
authors that the limiting of the perceived 
number of flashes was due to some basic 
physiological process. Bartley, as noted above, 
had also arrived at this conclusion, attribut- 
ing the perceptual limitation to a retinal 
process. The next study in the series on 
“temporal numerosity” was an attempt to 
test Bartley’s hypothesis (White, Cheatham, 
& Armington, 1953). 

This was done by studying the electro- 
retinographic (ERG) responses of a subject 
when he was presented with trains of visual 
stimuli such as were used in the first study. 
The results of this experiment did not agree 
with Bartley’s contention that the retina was 
unable to respond to rapid sequences of stim- 
uli. In this study it was found that the 
light-adapted retina reacted to each separate 
flash in a series of flashes up to at least 45 
flashes per second, the highest rate used.* 
There was evidence, however, that retinal 
processes may have been responsible for cer- 
tain perceptual phenomena noted in the first 
study. For example, the perceptual fusion 
found immediately after the onset of a train 
of stimuli seemed to correspond in time to 
the duration of the scotopic B wave produced 
by the first flash of that train. This, plus 
certain other results, led the authors to the 
tentative conclusion that the level of adapta- 
tion of the eye (thus the relative effectiveness 
of the scotopic components) could affect the 
number of flashes perceived in any given 
stimulus train. However, the ability of the 
retina to respond to high rates of stimula- 


8 Later work, with refined recording and analysis 
techniques, has shown that the retina can respond 
at much higher rates than this. L. A. Riggs (personal 
communication, 1962), for example, has obtained reti- 
nal following at rates as high as 125 per second. 


| 
| 


| 


TEMPORAL NUMEROSITY 9 


tion was interpreted as evidence that reti- 
nal processes did not determine the upper 
limit of the number of flashes perceived in 
any given stimulus train. This limiting func- 
tion was ascribed to some more central process 
in the nervous system. 

Another experiment performed at this same 
time gave additional support to the conten- 
tion that processes at the retinal level were 
not the primary limiting factor for perceptual 
rate. In this study stereoscopic presentation 
of the stimuli was used, with each successive 
flash in a stimulus train being presented to 
the left and right eyes alternately. The ap- 
pearance to the subject was that of a single 
flashing light. Number-rate combinations 
were chosen which the earlier study had shown 
to be inaccurately perceived by the subjects, 
but such that a stimulus train consisting of 
only half the stimuli presented at half the 
rate would probably be perceived accurately. 
Under these conditions it was found that the 
subjects could count the stimuli presented 
to each eye accurately with monocular view- 
ing, but with stereoalternative presentation 
of the stimuli their reports of the perceived 
number of flashes corresponded to those ob- 
tained in the first study. For example, 2 
flashes at 7.5 per second were accurately 
counted by each subject when presented mo- 
nocularly, but 4 flashes at 15 per second, the 
combined stimulus train, produced a mean 
response of 2.4 flashes when presented in the 
stereoalternate manner. This corresponds to 
a mean value of 2.24 flashes obtained in the 
original study for this number-rate combina- 
tion, It is believed that this simple demon- 
stration offers striking proof that retinal 
processes are not the limiting factor for per- 
ceptual rate (White et al., 1953). 

Once the conclusion had been reached that 
a central process was limiting the perception 
of sequentially presented light flashes, the 
next question to arise was whether or not 
similar perceptual limiting occurred in sense 
modalities other than vision, In order to see 
if this is so, Cheatham and White (1954) 
essentially replicated the original experiment, 
using sequences of sound pulses instead of 
flashes of light. The results of this study 
were quite definite, The perception of the 


pulses of sound appeared to be limited in a 
manner similar to that found in the visual 
experiment. The actual stimulus rates pre- 
sented varied from 10 to 30 pulses per second, 
while the corresponding overall perceptual 
rates varied only from 9 to 11 pulses per 
second. These values for the perceptual rate 
are derived from the slopes of the function 
obtained when the mean perceived number 
of pulses is plotted against time. It will be 
noted that these values of perceptual rate 
for audition are higher than those found for 
vision in the original study. This, plus other 
differences which were found, such as the 
fact that there was no perceptual fusion im- 
mediately following the onset of a stimulus 
train, strengthened the authors’ belief that 
certain aspects of the original visual results 
were due to scotopic processes at the retinal 
level. A re-evaluation of these data has shown 
that the overall slope of 9-11 per second is 
misleading, however, so the visual and audi- 
tory results are more alike than was suspected 
earlier. This will be explained more fully in 
the Discussion section of this paper. 

In the next series of experiments on this 
topic, White and Cheatham (1959) studied 
one additional sense modality, touch, and 
replicated portions of the original visual 
studies under conditions which would tend to 
minimize any scotopic effects. The results of 
the study of the perception of number of tac- 
tually presented stimulus sequences were in- 
distinguishable from the results of the audi- 
tory study. 

For the visual study the surround illumina- 
tion was raised to 1,000 ft-L, all other 
parameters being held as they were in the 
original visual experiments. The surround 
illumination in the original study was only 
1.8 ft-L. Only the 30 per second rate was used, 
and only 1-10 flashes at that rate, It was 
felt that this would be an adequate sample 
to determine whether or not the gross change 
in surround illumination would produce a sig- 
nificant change in the perceived number of 
flashes. The results obtained showed that a 
significant change did occur. The period of 
initial perceptual fusion, while not completely 
eliminated, was shortened, and the slope of the 
function once it did begin to rise was found to 


r 


10 Carrot T. WEITE 


be the same as that of the auditory and tactual 
functions. 

The limiting perceptual rate common to 
audition, touch, and vision (foveal, with sco- 
topic effects minimized) was assumed, on the 
basis of the above findings, to be about 12 per 
second, or about 80 milliseconds per perceived 
unit. This was interpreted by the authors as 
strong evidence 
in favor of the hypothesis that there is some temporal 
process in the central nervous system that tends to 
limit the perception of the inputs of the major sense 
modalities.* 

Forsyth and Chapanis (1958) have pub- 
lished an account of an experiment which 
confirmed the general findings reported by 
Cheatham and White (1952) in their first 
paper. Forsyth and Chapanis presented trains 
of flashes varying from 1 to 20 in number at 
six frequencies (2.5-30 per second). These 
stimuli were presented to the fovea, and at 
five other retinal positions extending out to 
40° from the fovea. The luminance of the 
flash was approximately 1,800 mL, and that 
of the surround about 22 mL. The test field 
subtended one half of a degree of visual angle. 
The results they obtained with foveal presen- 
tation were in good agreement with the 
Cheatham and White data, the minor differ- 
ences which were found being attributed to 
the fact that trained subjects were used in 
the earlier study and that a different psycho- 
physical technique was employed in each 
study. The results they obtained with the 
extrafoveal presentations showed that the 
perceived number of flashes reported for a 
given number-rate combination became less 
as the periphery of the retina was approached, 
the overall slope of the numerosity function 
remaining essentially constant for all retinal 
locations, however, 

When Forsyth and Chapanis plotted the 
perceived number of flashes as a function of 
time, they noted that a break in the function 
occurred at about 300 milliseconds, the slope 


tOn the basis of the results obtained in studies 
dealing with perceived temporal order in the various 
sense modalities, Hirsh and Sherrick (1961) postulated 
the existence of “some kind of time-organizing system 
that is both independent of and central to the sensory 
mechanisms [p. 431].” 


of the function below that point correspond- 
ing to a perceptual rate of about 13 per seond, 
and the slope above that point correspond- 
ing to a perceptual rate of about 6 per second. 
It is to be recalled that a similar break in 
this function at around 300 milliseconds was 
found by Cheatham and White, who also 
found that the perceptual rate beyond this 
point was about 6-7 per second. Their esti- 
mate of the perceptual rate preceding the 300 
millisecond break in the function was about 
10 per second rather than the 13 per second 
suggested by Forsyth and Chapanis, but this 
appears to be due to differences in the tech- 
niques of estimating the slopes. 

The one other study to date of this type 
is that reported by Page (1957). He obtained 
visual numerosity data which are in general 
agreement with the findings of Cheatham and 
White (1952), and Forsyth and Chapanis 
(1958), the only major difference being that 
he did not detect any critical point in his 
function such as was found in the other two 
investigations. Page noted that all of his sub- 
jects showed a high degree of reliability in 
their responses, and there were definite indi- 
vidual differences. He felt this was significant 
since no such marked individual differences 
were noted when he obtained critical flicker 
fusion (cff) thresholds for these same sub- 
jects. He also reported that the various light- 
dark ratios he used (.2, .5, and .8) did not 
result in any significant change in his sub- 
jects’ responses to any particular number- 
rate combination. 

On the basis of the results of all the studies 
which have been described, it was tentatively 
concluded that there is a cyclic process, or 
a number of similar processes, in the central 
nervous system which interacts with afferent 
neural activity in such a way as to establish 
an upper limit of the perceptual rate for the 
various sense modalities. The period of this 
cyclical process was assumed to be function- 
ally equivalent to the psychological unit of 
duration hypothesized in various forms by a 
number of previous writers. The results of 
experiments on perceptual rate in the senses 
of hearing, touch, and vision indicated that 
the upper limit of the perceptual rate in these 
three modalities is probably about 12 per 


j 
j 


. 


| 


TEMPORAL NUMEROSITY 11 


second, the period of which is about 80 milli- 
seconds. 

The results of these studies also indicated 
that the sense of vision may be a special case, 
since there appeared to be a second rate-limit- 
ing process, apparently at the retinal level, 
which acts upon the visual input. It was 
tentatively concluded that this is related to 
some activity of the scotopic visual system, 
since the conditions under which it appears 
to be most effective are those which would 
be most conducive to a relatively high level 
of scotopic activity. 

The exact nature of the hypothesized cen- 
tral processes could never be inferred from 
the results of perceptual studies such as those 
which have been discussed. The goal of such 
studies should be, instead, to specify as accu- 
rately as possible the temporal course of per- 
ceptual events, and to determine those con- 
ditions which modify this course of events 
in any significant manner. Such information 
could be of great value to those who can study 
the processes of the central nervous system 
directly. 


PURPOSE OF THE STUDY 


A major shortcoming of the hypothesis that 
the temporal numerosity technique provides 
information regarding the temporal character- 
istics of a central process which has a pro- 
found effect on all the major senses is the fact 
that the results obtained with vision are mark- 
edly different from the results obtained in 
the auditory and tactual studies. The studies 
on visual numerosity which have been per- 
formed to date show that the results vary as 
a function of certain viewing conditions, such 
as state of adaptation of the eye and the 
retinal locus of stimulation. More informa- 
tion regarding the nature and extent of such 
variations in the visual numerosity function 
may provide a better indication of their basis 
and help to clarify the relationship between 
the results obtained with the various senses. 

Another problem for such a hypothesis is 
the observed fact that the apparent rate of 
flicker decreases markedly as a source flashing 
at a given rate is moved from the center of 
the visual field toward the periphery. It has 
been reported (Forsyth & Chapanis, 1958) 


that the slope of the numerosity function, 
that is, the rate at which additional perceived 
flashes are added to a flicker sequence, is the 
same regardless of the part of the retina 
stimulated. This would appear to be a contra- 
diction of the easily observable apparent rate 
phenomenon, if it is to be assumed that the 
apparent rate of flicker is determined by the 
rate at which perceived flashes are added. 
More evidence was necessary on this before 
anything definite could be concluded. 

In order to obtain the information neces- 
sary to deal with these problems it was de- 
cided that visual numerosity functions should 
be obtained for a wide variety of viewing 
conditions. The adaptation level of the eye, 
the retinal locus of stimulation, and the inten- 
sity of the flashing light were chosen as the 
most appropriate variables. The first two of 
these variables had already been shown to have 
an effect on the results, but they had not been 
systematically varied in the earlier studies. 
The third variable, the flash intensity, had 
not appeared to be an important variable in 
an earlier study (Cheatham & White, 1952), 
but this work had been limited to foveal 
presentation of the stimulation. It was 
thought that it might become a factor when 
peripheral stimulation was utilized. 


METHOD 


Apparatus 


The only major apparatus required for this study 
was a device which would provide any desired num- 
ber of light flashes at a given rate. Although the 
device used in the earlier visual studies was still 
available, this consisting primarily of a de light source 
and a slotted disk, it was decided that that technique 
would be too inefficient because of the great number 
of observations required in the present study. Instead, 
an all-electronic system for producing the trains of 
light flashes was utilized, consisting of commercially 
available standard laboratory equipment. 

A block diagram of the system for producing the 
flash sequences is shown in Figure 1. A standard 
pulse frequency of 100 kc was generated by a Berke- 
ley Universal counter and timer (Model 5510). The 
output was fed into a Berkeley electronic clock 
(Model 5631-1), the function of which was to divide 
the standard pulse frequency down to an appropriate 
level by allowing a pulse to pass through it only after 
a predetermined number of pulses had been received. 
The output from the electronic clock was fed into a 
Berkeley dual preset counter (Model 5446-4), which 
was used in conjunction with the dual preset counter 


12 CARROLL T. WHITE 


(A) BERKELEY UNIVERSAL COUNTER 
AND TIMER (MODEL 5510) 

(B) BERKELEY ELECTRONIC CLOCK 

(MODEL 5631-1) 


(C) BERKELEY DUAL PRESET COUNTER 
(MODEL 5446-4) 


(D) RELAY 


(E) GRASS PHOTO - STIMULATOR 
(MODEL PS-2C) 


(F) FLASH LAMP 


Fic. 1. Block diagram of equipment used to produce 
stimulus sequence. 


relay. The dual preset counter was set for the fre- 
quency of the incoming pulses. Any given number of 
pulses was then obtained by setting the appropriate 
dials on the counter to the number desired. When the 
experimenter triggered the start of a stimulus sequence 
by pressing a switch, the dual preset counter relay 
closed for an interval which was just sufficient to 
allow the desired number of pulses to pass through, 
the duration of this interval of closure being deter- 
mined by the two settings made on the dual preset 
counter, i.e., in regard to the frequency of the pulses 
and the number of pulses required. 

The pulses which were allowed to pass through the 
relay then triggered a Grass photostimulator, whose 
flash lamp produced the stimulus sequence. The out- 
put of the Grass photostimulator was also fed into a 
second Berkeley Universal counter and timer (Model 
5510), which indicated the actual number of flashes 
which had been produced. This was necessary be- 
cause the closing of the dual preset counter relay was 
not synchronized with the pulses coming into it, so 
that depending upon the phase of the pulse cycle 
when closure did occur (and the fact that closure 
time varied a small amount) the relay might pass one 
more or one less pulse than was desired by the experi- 
menter. 

The flash lamp of the Grass photostimulator was 
connected to the rest of the device by a flexible cable 
about 10 feet in length. This made it possible to have 
only the flash lamp itself in the subject’s cubicle, with 
the remainder of the apparatus outside. The flash 
lamp was mounted at eye level behind a large piece 
of white art board which was in the form of a half 
cylinder, The face of the flash lamp was mounted 
flush with the art board at a point where a 1-inch 
square hole had been cut out. The face of the flash 
lamp was masked, except for a circular area 14 inch 
in’ diameter. A single sheet of Ozalid paper was 
mounted between the face of the flash lamp and the 
mask to serve as a diffusing screen. 

The surround illumination was provided by two 


spotlights located behind the subject, one shining over 
his left shoulder and the other over his right shoulder. 
The light output of both spotlights was controlled by 
heavy-duty rheostats. 

An audible click was produced each time the flash 
lamp was activated. In order to remove this cue, 
white noise was fed into the cubicle by means of a 
loudspeaker located beside the subject. The intensity 
of this noise was set to be just strong enough to 
completely mask the clicks produced by the flash 
lamp. 

A chin rest was provided for the subject, located 
such that when he was in position his right eye was 
15 inches from the face of the flash lamp. (Thus 
under standard conditions the stimulus light sub- 
tended a visual angle of 1°.) All observations were 
made monocularly, using the right eye, the left eye 
being covered by an eyepatch. Fixation points were 
inscribed on the background screen at every 10° 
interval from 70° to the left to 70° to the right of the 
stimulus light source. Each such point was identified 
so there would be no confusion when the subject was 
directed to fixate a given point. 


Experiment I 


Visual numerosity functions were obtained under 
each of 36 viewing conditions for two experienced 
observers (ML and CW). These 36 conditions rep- 
resented all the possible combinations for three levels 
of adaptation, four retinal positions, and three levels 
of stimulus intensity. The three adaptation levels 
used were 10, 100, and 1,000 ft-L. The four retinal 
positions were the fovea and points 10°, 20°, and 30° 
out from the fovea on the temporal retina along the 
horizontal plane. It is not possible to state the exact 
stimulus intensities used, because of the nature of the 
flashes. On the control panel of the photostimulator 
the levels used were those designated as 2, 4, and 8. 
The instruction book accompanying the photo- 
stimulator states that each successive intensity setting 
produces a flash that is judged to be approximately 
twice as bright as that produced by the preceding 
setting, so it can be assumed that the intensity settings 
used were each about one log unit apart, as were the 
adaptation levels. These three particular settings were 
chosen because the lowest level, that is, “2,” was just 
visible under the most extreme conditions of retinal 
locus and contrast. 

The stimulus presentation rate of 25 per second was 
used exclusively in this study. The rate most com- 
monly used in the earlier studies on this topic was 
30 per second, so the rate used in the present work 
was chosen to be close to that so the results would 
be more comparable. The nature of the system used 
to produce the trains of pulses in the present study 
was such that a rate of 30 per second was not at- 
tainable (only submultiples of 100 per second could 
be attained, such as 50, 33%4, 25, and 20 per second). 
The 25 per second rate was the highest available rate 
that still resulted in clearly distinguishable flicker, so 
it was used rather than the next higher rate, 3314 per 


TEMPORAL NUMEROSITY B 


second, even though this latter rate was nearer to the 
rate used in the earlier work. 

The numbers of stimuli used per sequence were 
2, 3, 4, 5, 6, 7, 9, 11, 13, and 15. It was felt that 
it would not be necessary to use every consecutive 
number of stimuli per sequence in order to obtain 
the desired information regarding the numerosity 
function, Ten responses were obtained for each such 
number sequence under all the conditions utilized in 
the present study. 

The experimental program for each observer con- 
sisted of 12 sessions, each session lasting approxi- 
mately 2 hours. Only one adaptation level was used 
during any 1 session, the order of the three adapta- 
tion levels over the 12 sessions being balanced such 
that the average position of each level in the series of 
sessions was about the same. Each session consisted 
of 15 subsessions, representing each of the 12 retinal 
position-stimulus intensity combinations and a repeti- 
tion of three such combinations. Each of the 36 con- 
ditions was represented by 5 such subsessions, whose 
average position in the total series was as nearly as 
possible the same as those of those subsessions rep- 
resenting all the other conditions. During each sub- 
session two judgments were obtained for each of the 
10 stimulus sequences, the order of presentation of 
the different stimulus sequences being random. Thus 
300 judgments were required of the observer during 
any 1 session, for a total of 3,600 judgments for the 
entire experiment. 

Prior to the beginning of each session the observer 
was allowed to remain in complete darkness for 10 
minutes. Following this the appropriate level of sur- 
round illumination was established and the observer 
was allowed to adapt to this level for a period of 
5 minutes before the testing began. At the beginning 
of each subsession the observer was instructed to 
fixate a given point, and he repeated this instruction 
to insure that there was no misunderstanding. When 
the experimenter had selected the number of flashes 
to be presented he would alert the observer by saying 
“Ready.” Approximately 2 seconds following this the 
experimenter would press a switch which turned on 
the masking noise. This masking noise remained on 
for a period of 5 seconds. At some time during this 
period the experimenter would press a second switch 
which would start the stimulus sequence. When the 
masking noise stopped the observer would state the 
number of flashes perceived, and this response was 
recorded by the experimenter. This procedure was 
then repeated for the remainder of the session. 

Tt was originally planned that more than two ob- 
servers were to be used in this experiment, but by the 
time the second observer had been run, the nature of 
the result showed that to do so would be quite super- 
fluous. Instead, it was felt that it would be much 
more instructive to study the responses of these same 
two observers under more extreme viewing condi- 
tions. This decision led to Experiment II. 


Experiment II 


The purpose of this experiment was primarily to 
determine whether or not the finding that the slope 


of the visual numerosity function beyond 300 milli- 
seconds remained constant regardless of retinal locus 
of stimulation would hold for a more extreme locus 
of stimulation than had been tested in Experiment I 
of this study or in the Forsyth and Chapanis study. 
It was also decided to use this opportunity to deter- 
mine if differences in the spectral characteristics of 
the stimulus appear to produce any differences in the 
numerosity function. 

‘The stimulus rate was the same as in Experiment I, 
25 flashes per second. The surround illumination was 
held constant at 1,000 ft-L, and the highest available 
flash intensity was used at all times (Level 16 on the 
Grass photostimulator). The stimulus surface sub- 
tended a visual angle of 1°, as in the previous experi- 
ment. The retinal positions tested were the fovea, 
and points located 30° and 70° from the fovea along 
the horizontal plane on the nasal retina. The differ- 
ences in the spectral characteristics of the stimulus 
flashes were obtained by the use of two color filters, 
one a deep red (Wratten No. 29) and the other a 
green and red absorption filter (Wratten No. 48). 

The experimental program consisted of two 2-hour 
sessions for each of the observers, with 300 judgments 
being required during each such session. During each 
session the color filters were alternated after every 
30 judgments. While a given filter was in use all of 
the 30 stimulus number-retinal position combinations 
were presented in a quasi-random fashion. The 
procedure for presenting the stimulus sequences and 
recording the observer's responses was the same as 
that described for Experiment L 


Experiment III 


A very interesting phenomenon indicated by the 
results of Experiment II led to this brief experiment. 
All the conditions were the same as described for the 
preceding experiment except that only foveal stimula- 
tion was used and the size of the stimulus spot was 
reduced so that it subtended a visual angle of 
15 minutes of arc. 

The experiment consisted of a single session for each 
observer, during which 200 judgments were required, 
The procedure was the same as in Experiment II 
except that the color filters were alternated after 


every 20 judgments. 
RESULTS 


Experiment I 

The results of this experiment are pre- 
sented in Figures 2 and 3, for Observers ML 
and CW, respectively.” 


5 Copies of the thesis on which this paper is based, 
including complete tabulations of all the data, can be 
obtained from University Microfilms, Inc., 313 North 
First Street, Ann Arbor, Michigan. Order No. 63- 
1060 remitting in advance $2.75 for Microfilm or 
$6.80 for Xerox copies. 


14 CARROLL T. WHITE 


The general similarity of all the functions 
shown in these graphs is quite apparent. 
Specific differences in these functions produced 
by the various experimental conditions are 
also clearly shown. It can be seen that the 
functions for the peripheral retinal locations 
lie below those for the foveal or near foveal 
locations, indicating that for any given num- 
ber of flashes in a stimulus sequence fewer 
flashes were perceived by the observers when 
the more peripheral locations were stimulated. 

By comparing Figures 2 and 3 it can also 
be seen that the two observers responded quite 
differently to the peripheral stimuli, a change 
in the retinal locus of stimulation affecting 
Observer CW much more radically than it 
did Observer ML, The nature of the differ- 
ences between the results of these two ob- 
servers seems to indicate that there was a 
marked difference in peripheral sensitivity be- 


tween them. This can be seen most clearly 
if their responses under the more extreme 
viewing conditions are compared (e.g., 30° 
retinal position, 1,000-ft-L surround, Flash 
Intensity Level 2). In such viewing conditions 
CW was unable to see the stimulus flashes a 
high percentage of the time, while ML only 
rarely could not detect them. 

It is, however, the similarities between the 
results of these observers and not the differ- 
ences that are most important. When those 
situations in which the visibility of the stim- 
ulus light was difficult are ignored, as they 
should be in a study of this kind, the really 
meaningful findings become apparent. The 
most striking thing to be noted is that the 
slopes of the second halves of all the functions 
seem to be about the same, regardless of the 
marked differences which exist in the first 
halves of these functions. These slopes appear 


SURROUND BRIGHTNESS (FT-L) 


1000 


a 

N 

È 

rA 

=> 

i 

fe 4 

=~ 4 ss 

n 2 

n ' 

uu 

zZ 

nn 

5 

= 6 

5 s 
4 


N 
aR 
a 


234567 9 I 13 15 
No 


234567 9 I 13 15 


100 


-nNwaaon 


-nuUaso 


234567 9 13 15 
No No 


ML 


Fic. 2. Subject ML: Mean perceived number of flashes as a function of the number of flashes 
presented, for various viewing conditions. (Stimulus presentation rate = 25 flashes per second. 


Target diameter = 1° visual angle.) 


TEMPORAL NUMEROSITY 18 


SURROUND BRIGHTNESS (FT-L) 


1000 100 


-nuaa a 


-nu poo 


BRIGHTNESS (REL. UNITS ) 


-nu aaa 


-nuaran 


234567 9 I 13 15 
No 
6 
5 
4 
Ns 3 
2 
1 
Lusis ioii 
234567 9 Il 13 15 


Ns 


-nu sua 
ET PER Fas a P 


234567 9 I 13 15 234567 9 1 1315 234567 9 i) 13 15 
No No No 


Fic. 3.. Subject CW: Mean perceived number of flashes as a function of the number of flashes 
presented, for various viewing conditions. (Stimulus presentation rate = 25 flashes per second. 


Target diameter = 1° visual angle.) 


to be similar for both observers, and under 
all the viewing conditions wherein the stim- 
ulus light was clearly visible, including the 
various retinal locations stimulated. This last 
result, concerning the retinal locus of stimula- 
tion, confirms the finding reported by Forsyth 
and Chapanis (1958). 

On the basis of the results of the present 
experiment, combined with the findings re- 
ported earlier, it is concluded that the visual 
numerosity function consists of two major 
segments. The nature of the first segment 
can apparently be influenced to a great extent 
by the viewing conditions and by individual 
differences. The second segment does not 
appear to be influenced by such factors. The 
transition point between these two segments 
seems to lie in the region about 250-300 milli- 
seconds following the onset of stimulation (the 


region of time occupied by the seventh and 
eighth flashes in the present study). 

In order to obtain a clearer picture of the 
visual numerosity function all the appropriate 
data for each retinal locus of stimulation were 
combined for each observer. The data ob- 
tained when Flash Intensity Level 2 was used 
in conjunction with surround brightness of 
1,000 and 100 ft-L at positions 20° and 30° 
from the fovea were not included, for the 
reason already discussed. These combined 
results are shown in Figure 4, The various 
aspects of the visual numerosity function 
which have been described are clearly demon- 
strated, such as the constant slope during the 
latter half regardless of the retinal locus of 
stimulation. The two segments of the function 
are readily seen with the transition between 
them occurring approximately at that point 


16 CARROLL T. WHITE 


in time after the onset of stimulation which 
is occupied by the seventh flash (i.e., about 
250 milliseconds). Measurement of the maxi- 
mum slopes shown during the first and second 
segments indicates maximum rates of increase 
in perceived number of about 12 per second 
and 6-7 per second, respectively. It is to be 
noted that all differences between the func- 
tions originate during the initial segment, and 
whatever differences in absolute level that 
exist at the point in time represented by the 
seventh flash tend to remain constant from 
then on. 

Figure 4 also shows the differences in the 
response patterns of the two observers rather 
well. Again it can be seen that the differences 
between corresponding functions for these two 
observers originate during the period immedi- 
ately following the onset of stimulation. The 
differences between the results for these ob- 
servers seem to be due to the tendency for CW 
to exhibit longer periods of initial fusion under 
conditions of nonfoveal stimulation than does 


-Nuo aa 


NUMBER 


234567 9 Il 13 15 


PERCEIVED 


234567 9 Il 13 15 
NUMBER OF FLASHES 


Fic. 4. Mean perceived number of flashes as a 
function of the number of flashes presented, with 
retinal locus of stimulation as parameter. (All the 
usable data obtained under the various flash intensity- 
surround brightness combinations have been com- 
bined—see text for details. Each point in the fovea 
and 10° functions represents 90 judgments, while each 
point in the 20° and 30° functions represents 70 
judgments.) 


ML. In spite of these differences, however, 
the basic characteristics of the numerosity 
functions are the same for both observers, i.e., 
in terms of the slopes and the transition point. 

This is even more clear in Figure 5, in 
which these observers’ functions for foveal 
stimulation and for stimulation of a point 20° 
from the fovea are directly compared. It can 
be seen that the two foveal functions are 
practically identical in every respect, while in 
the 20° condition Observer CW continued to 
report “one” for a longer period than did ML, 
thus accounting for the difference in the 
absolute level of the two functions. 

The results of this experiment have pro- 
vided some important information regarding 
the nature of the visual numerosity function. 
It is now seen to consist of two major seg- 
ments, the first extending from the onset of 
stimulation up to a point in time approxi- 
mately 250 milliseconds after onset. The exact 
form of this first segment can be markedly 
affected by the state of adaptation of the eye, 
the retinal locus of stimulation, and by indi- 
vidual differences between observers. The 
intensity of the flashing stimulus has only 
a slight effect on the results, as long as the 
flashes are clearly visible yet not excessively 
bright relative to the level of adaptation. All 
of the changes in the form of the initial seg- 
ment of the visual numerosity function which 
occur as a function of any of these variables 
consist primarily of variations in the duration 
of the initial fusion period, that is, the period 
immediately following the onset of stimulation 
during which the observer tends to perceive 


PERCEIVED NUMBER 


ule 
Me 
en (es 
a 
© 


3 15 


NUMBER OF FLASHES 

1 L 1 1 i 4 

o 100 200 300 400 500 600 
TIME IN M SECONDS 


Fic. 5. Sample functions from Figure 4, replotted to 
illustrate certain points brought out in the text. 


TEMPORAL NUMEROSITY 17 


only a single flash. The form of the second 
segment of the numerosity function, from 
about 250 milliseconds onward, does not ap- 
pear to be affected by any of these variables. 


Experiment II 


The results of this experiment are presented 
in Figure 6. 

The first thing to be noted is that the slopes 
of the latter part of the functions tend to re- 
main constant even when the locus of stimula- 
tion is as much as 70° from the fovea. This 
result greatly extends the generality of the 
finding first reported by Forsyth and Chapanis 
(1959) and confirmed by Experiment I of the 
present study. 

The marked difference in the initial seg- 
ments of the numerosity functions obtained 
with nonfoveal stimulation, shown by the two 
observers in Experiment I, is also exhibited 
here. As before, this difference is due to the 
much longer initial fusion period shown by 
Observer CW. 

A comparison of the data plotted in Fig- 
ure 6 also shows that there is an interesting 
difference between the foveal functions ob- 
tained with the blue and with the red stimulus 
flashes. The nature of this difference is made 
more clear by Figure 7, in which the data for 
both observers are combined.° 


6 The fact that the difference between these two 
functions is statistically significant (from five flashes 
onward) can easily be shown by applying a simple 
nonparametric statistic, such as the sign test, to either 
the combined data or to the individual data. 

1° STIMULUS 


ue 1000 FT-L SURROUND kD 


234567 9 i 13 15 
—FOVEA 


234567 9 1 1315 
NUMBER OF FLASHES 
(ry) 


NUMBER 


PERCEIVED 


254567 9 U 1315 234567 9 1 1315 


Fic. 6. Mean perceived number of flashes as a 
function of the number of flashes presented, with 
retinal locus of stimulation and color of stimulus light 
as parameters. (Stimulus presentation rate = 25 
flashes per second—further details in text.) 


FOVEA 
{* STIMULUS 
— BLUE 
=== RED 


PERCEIVED NUMBER 


ei oe i 

Ni 

MBER OF, FLASHES 

o 100 200 300 400 500 600 
TIME IN M SECONDS 


Fic. 7. Mean perceived number of flashes as a 
function of the number of flashes presented, with the 
color of the stimulus light as parameter. (The presen- 
tation time has also been indicated on the abscissa. 
Foveal stimulation, with stimulus diameter of 1° 
visual angle. Stimulus presentation rate = 25 flashes 
per second.) 


The functions plotted in Figure 7 indicate 
that at about 140-160 milliseconds after the 
onset of stimulation an additional perceived 
flash was added to the flicker sequence when 
the blue stimulus flashes were employed, but 
not when the red stimulus flashes were used, 
Except for this one sudden increase shown for 
the blue stimulus condition, the two functions 
are identical. This phenomenon can only be 
interpreted as meaning that some aspect of 
the visual processes initiated by the red stim- 
ulus flashes used in this experiment tended to 
prevent the perception of a flash at a specific 
point in time following the onset of stimula- 
tion. 


Experiment III 


The results of this experiment are presented 
in Table 1. The mean values of the responses 
are plotted in Figure 8. 

It can be seen that there is no marked dif- 
ference between the numerosity functions for 
the blue and the red stimuli, such as was found 
in the previous experiment.’ At that specific 
point in time after the onset of stimulation 
at which the differential effect has occurred 
(about 140 milliseconds) a perceived flash 
was added in the case of the red stimulus con- 
dition as well as when the blue stimuli were 
used in the present experiment. Since the 


7 The application of the sign test to these data 
indicates that there is no significant difference between 
the red and the blue functions. 


18 Carrott T. WHITE 
TABLE 1 
DISTRIBUTION OF NUMBER OF PERCEIVED FrasHEs (Ns) AS A FUNCTION OF NuMBER OF FLASHES 
PRESENTED (No) FoR RED AND BLUE FLASHES PRESENTED FOVEALLY 
Blue Red 
Ns N, 
ik 2 3 4 5 6 M 1 2 3 4 5 6 M 
Observer ML. 
2 4 14 7 3 18 
3 1 9 19 10 2.0 
4 7 3 23 9 1 2.1 
$ 3 7 2.7 3 7 27 
No 6 10 3.0 9 2.9 
7 10 3.0 9 1 3.1 
9 Ig 9 3.9 1 9 3.9 
11 8 2 4.2 7 3 4.3 
13 10 5.0 5 5 45 
15 4 6 5.6 1 8 1 5.0 
Observer CW 
2 9 1 1.2 8 2 1.2 
3 10 2.0 10 2.0 
4 3 7 27 9 1 21 
5 3 7 2.7 1 2.9 
No 6 1 9 2.9, 10 3.0 
7 10 3.0 10 3.0 
9 7 3 3.4 6 4 3.4 
11 2 7 1 3.9 2 8 3.8 
13 5 5 4.5 7 3 4.3 
15 2 4 4 5.2 3 7 4.7 
Note.—Surround, 1,000 ft-L; flash intensity, 16; target diameter, 15’. 
only difference in experimental conditions be- Discussion 


tween this and the previous experiment was 
the area of the stimulus surface, the results of 
these two experiments seem to indicate the 
possibility of a spatial inhibition effect of some 
kind which appears as the stimulus area is 
increased, this effect occurring only when the 
stimulus flashes have certain spectral charac- 
teristics, 

The plot of the combined data shown in 
Figure 8, utilizing all the data obtained in this 
particular experiment, probably represents the 
“purest” visual numerosity function yet ob- 
tained. That is, it is least affected by 
peripheral factors so probably represents the 
temporal characteristics of the central process 
(or processes) underlying the temporal nu- 
merosity phenomena rather accurately. 


During the last decade there has been a 
marked resurgence of interest in the problem 
of correlating psychological and neurophysio- 
logical phenomena. The disenchantment with 
this type of endeavor felt by many psycholo- 
gists is rapidly disappearing in the light of 
the many recent findings regarding the nature 
of the various neurophysiological phenomena 
and with the development of more sophisti- 
cated techniques for the analysis of such 
phenomena. 

The current feeling on this matter is well 
stated by Fessard (1959) in his introduction 
to the section dealing with brain potentials 
and rhythms in the recently published Hand- 
book of Physiology: 


TEMPORAL NuMEROSITY 


« « x 
3 3 3 5 COMBINED 
2 z he 
e FOVEA 2 o 
5 isstimutus| S FOVEA ws 
irr) a 15'STIMULUS ry] 
—— Blue 
g IZZ RED g sue} 2 2 
iw a Saree, an 
234567 9 I 13 15 234567 9 II 13 15 234567 9 N13 15 
NUMBER OF FLASHES NUMBER OF FLASHES NUMBER OF FLASHES 
PU Oe EE T EPE ey 
© 100 200 300 400 500 600 © 100 200 300 400 500 600 © 100 200 300 400 500 600 


TIME IN M SECONDS TIME IN M SECONDS TIME IN M SECONDS 


Fic. 8. Mean perceived number of flashes as a function of the number of flashes presented 
(and of the time required to present them), with the color of the stimulus light as parameter. 
(Stimulus presentation rate = 25 flashes per second. Foveal stimulation. Stimulus diameter 


19 


= 15’ visual angle. Surround illumination = 1,000 ft-L.) 


It is difficult and to a certain extent artificial, once 
a potential has been described, not to speak of the 
link it appears to have with an actual operation of 
the nervous system of which it thus becomes a sign: 
projection of an afferent message, interactions be- 
tween central activities or emission of efferent im- 
pulses. This most often involves simple questions of 
functional topography or chronology but may also go 
so far as to relate to highly integrated psychological 
processes or to the well-defined symptoms of patho- 
logical behavior such as those of epileptic seizures 
once it has been recognized that reliable correlations 
exist between these phenomena and some parameter 
or parameters of brain potentials or rhythms that 
have initially been studied for themselves. New 
specific aspects of brain potentials are often dis- 
covered as a consequence of functional explorations of 
this kind [p. 256]. 


In addition to the invaluable information on 
this subject brought together in various 
chapters of the Handbook of Physiology, there 
are also a number of excellent and comprehen- 
sive survey articles now available, covering 
the recent work on this topic, which put more 
stress on the psychological aspects. Two out- 
standing examples of these are the mono- 
graphs by Ellingson (1956) and Mundy- 
Castle (1958). 

One of the psychologists who was not dis- 
couraged by the inconsistencies and the mean- 
inglessness of much of the earlier work dealing 
with EEG was D. B. Lindsley. He realized 
that many of these studies represented naive 
attempts to correlate a poorly understood 
physical phenomenon with even more poorly 
defined complex psychological syndromes, and 
that the predictable failure of such attempts 
by no means represented a fair estimate of the 
possible utility of EEG for psychological re- 


search, His approach to this problem has been 
to investigate as fully as possible the neuro- 
physiological events which appear to underlie 
the cyclic patterns of activity, and to try to 
discover whether these events or patterns seem 
to bear any relationship to comparatively 
simple motor or perceptual behavior. In an 
earlier paper on this subject (Lindsley, 1952) 
and in a more recent article (Lindsley, 1955) 
he discussed the results of a number of experi- 
ments, dealing with both motor activity and 
perception, which he felt offered strong evi- 
dence of a definite functional relationship be- 
tween the cyclic activity patterns of the brain 
and certain behavioral events. Among the 
studies he discussed in these articles were a 
number of those which were described in the 
introduction of this paper (e.g., Bartley’s 
brightness enhancement phenomenon, and 
Murphree’s study of perceptual simultaneity). 

Lindsley pointed out that the most fruitful 
approach to this problem would probably be 
by correlating certain temporal aspects of the 
neuroelectric activity with temporal aspects 
of behavior. The reason is that the temporal 
aspects of the neuroelectric activity can be 
determined with the greatest accuracy. Even 
the recent developments in the field of average 
response computers, etc., do not change this, 
although they do make other aspects of the 
neuroelectric activity more useful for studies 
of this kind. Rates of activity and critical 
points in time are the two measures specifically 
mentioned by Lindsley as being useful in this 
respect (Fessard’s “simple question of 
chronology”). 


| 


20 t CARROLL T. WHITE 


One of the goals of the present work was to 
provide neurophysiologists with perceptual 
data in a form which would be most useful 
to them in their attempts to discover what 
functions, if any, are served by the various 
gross phenomena observed in neural activity. 
The data obtained in this and the earlier tem- 
poral numerosity studies: would seem to be 
ideally suited to this purpose, since they can 
be described in terms of rates and critical 
points in time. 


Visual Numerosity Function 


On the basis of the work done to date on 
this topic the following general description of 
the visual numerosity function can be made. 
The one condition which must be met for these 
comments to hold true, however, is that the 
stimulus light be clearly visible to the subject. 
That is, threshold conditions, such as were 
encountered in Experiment I of the present 
study, are not to be included. 

The visual numerosity function seems to be 
made up of two distinct segments: (a) from 
the onset of stimulation up to about 250-300 
milliseconds; and (b) from about 250-300 
milliseconds on. The exact form of the first 
segment of this function appears to be in- 
fluenced to a greater or lesser extent by all 
the major variables, including individual dif- 
ferences between subjects. The second seg- 
ment, however, does not seem to be affected 
by any of these variables. In other words, 
starting at a point in time about 250-300 
milliseconds after the onset of stimulation, the 
rate of increase of the number of perceived 
flashes is approximately constant regardless of 
the conditions of stimulation, including the 
location of such stimulation on the retina. 
This was one of the most striking findings 
in the study reported by Forsyth and Chapanis 
(1958), and it is confirmed by the present re- 
search. Some implications of this finding will 
be discussed in a later section. 

-A more definitive description of the nu- 
merosity function for the first 300 milliseconds 

_can be given for any specific set of conditions. 
For example, when the stimulus light is viewed 
directly, this segment of the function consists 
of two definite components: first, there is an 


initial fusion period, during which time the 
subject reports that he sees only a single 
flash; following this there is a short period 
when the function rises rapidly, the slope in- 
dicating a rate of increase of perceived flashes 
of about 12-13 per second. This rate is not 
maintained, however, but instead the function 
tends to level off about 200 milliseconds after 
the onset of stimulation. At about 250-300 
milliseconds after onset, the second major seg- 
ment of the numerosity function, described 
in the preceding paragraphs, begins. As a 
result of all the studies done on this topic to 
date, there is good agreement to the fact that 
the slope of this portion of the function be- 
yond 300 milliseconds indicates a rate of 
increase of perceived flashes of approximately 
6-7 per second, 

It can be seen that the function just de- 
scribed can provide the neurophysiologist. with 
four useful pieces of information regarding 
this type of visual perception. These are the 
two rates shown at different times after the 
onset of stimulation, the critical point in time 
marking the beginning of the second, slower, 
rate and the other critical point in time mark- 
ing the end of the initial fusion period. In 
the case of foveal presentation of the stimulus 
the first three of these factors will remain 
essentially constant over a very wide range of 
viewing conditions, but the fourth will vary 
with the duration of the initial fusion period. 
It has been found that this duration depends 
primarily upon the state of adaptation of the 
eye, and to a lesser extent upon the interaction 
between the state of adaptation and the 
intensity of the stimulus light. This phenom- 
enon will be discussed in more detail in a 
later section. 

As the locus of stimulation is moved further 
and further away from the fovea, there is a 
marked tendency for the initial fusion period 
to become longer. As this occurs, the second 
phase, that is, the 12-13 flashes per second 
rate of increase, becomes less evident so that 
in the most extreme situations the function 
may go directly from the initial fusion period 
into the secondary (6-7 flashes per second) 
rate. A tendency for this to occur can be seen 
in the results for Observer CW in Experi- 
ment I of the present study, and was one of 


~ > —e— 


TEMPORAL NUMEROSITY s 21 


the general findings reported by Forsyth and 
Chapanis (1958). 

One of the questions put forth in the intro- 
duction was whether or not the “break” in the 
visual numerosity function (at about 300 
milliseconds) would be found when the eye 
was adapted to a high level of surround bright- 
ness. This was because those studies in which 
such a critical point was found had all utilized 
relatively low levels of surround (Cheatham & 
White, 1952; Forsyth & Chapanis, 1958), and 
the one study which had utilized a high sur- 
round (White & Cheatham, 1959) had not 
presented the subjects with any stimulus trains 
with duration greater than 300 milliseconds. 
The results of the present study provide an 
affirmative answer to that question. 

One of the assumptions stated in the intro- 
duction was that the rate of increase in the 
perceived number of flashes would be greater 
when the eye is adapted to a high brightness 
level. This assumption was based on the re- 
sults of the earlier study in which a high level 
of adaptation was used (White & Cheatham, 
1959). The results of the present study, and 
a closer scrutiny of the earlier results, lead 
to the conclusion that such an assumption was 
unwarranted. It turns out that the major 
effect brought about by increasing the level 
of adaptation of the eye is a marked decrease 
in the duration of the initial fusion period. 
This will result in an increase in the absolute 
number of flashes perceived when a given 
stimulus train is presented, but this should 
not be interpreted as a difference in the rate 
of increase. An examination of the results of 
Experiment I will show that the rate of in- 
crease is not affected to any degree by changes 
in the adaptation level. It is to be remembered 
that we are concerned with only those situa- 
tions in which the subject could see the stim- 
ulus clearly at all times, so any of the viewing 
conditions wherein the stimulus light was near 


- the subject’s threshold should be ignored (e.g., 


the results for Observer CW when the stim- 
ulus was presented at the more peripheral loci 
against the high surround levels). 

One general conclusion about the visual 
numerosity function is that the first major seg- 
ment, that is, from the onset of stimulation up 
to about 300 milliseconds, can be influenced 


by external or peripheral factors, while the 
second major segment, from 300 milliseconds 
onward, seems to be completely determined 
by central factors. The maximum attainable 
rate found during the first segment (12-13 
per second) is also undoubtedly determined 
by some central process. Any modifications in 
the form of this segment of the function seems 
to be brought about primarily by variations 
in the initial fusion period. 


Initial Fusion Period 


The phenomenon of the initial fusion period 
was one of the most striking results of the 
first study of temporal numerosity by Cheat- 
ham and White (1952). In that study it was 
found that when stimulus trains of the 30 
per second rate were presented, under the 
specific viewing conditions used, trains of one, 
two, or three stimuli were reported as “one” 
by the subjects, whereas a sequence of four 
stimuli was almost invariably reported as 
“two.” The reliability of these judgments 
convinced the authors that some definite 
neurophysiological event was underlying this 
sudden increase in the perceived number. 

After this first study was completed ar- 
rangements were made to obtain ERG records 
of retinal activity produced when the subject 
was presented with rapid sequences of stimuli 
such as were used in the earlier study. The 
viewing conditions in this ERG study were 
made as nearly the same as in the perceptual 
study as possible. An example of the type of 
ERG record which was obtained under these 
conditions is shown in the upper part of Fig- 
ure 9. This particular record illustrates the 
eye’s reaction to a 30 per second stimulus train 
consisting of 20 stimuli. Records such as this 


jaa) OLED Eter pea 


Fic. 9. Sample electroretinographic (ERG) re- 
sponses to sequences of flashes presented at a rate of 
30 per second. (Details in text.) 


22 + CARROLL T. WHITE 


showed quite clearly that the eye could re- 
spond to every flash at stimulus rates such 
as were being used in the perceptual studies so 
that the very low perceived number of flashes 
could not be attributed to an inability of the 
retina to respond rapidly enough. 

However, these records did show something 
which it was felt might help to explain the 
initial fusion period. It is assumed that a 
retinal response such as shown in the illustra- 
tion consists of both scotopic and photopic 
responses. The large, slow response occurring 
at the beginning is presumed to be the scotopic 
B wave, and the small, rather regular re- 
sponses to each of the stimuli in the sequence 
are presumed to represent photopic activity. 
It can be seen that the first few photopic 
responses are superimposed on the scotopic 
response. It is of special interest to note that 
the fourth photopic response is the first one 
that is not superimposed on the larger re- 
sponse. Remembering that in the perceptual 
counterpart to this ERG study, performed 
with the eye at the same level of adaptation, 
etc., that it was with the presentation of a 
fourth stimulus in the sequence that the sub- 
jects broke away from the initial fusion period, 
it was not difficult to believe that the initial 
fusion period might be directly related to the 
scotopic B wave. 

The ERG records shown in the lower half 
of Figure 9 were obtained expressly to demon- 
strate the plausibility of the hypothesis that 
the duration of the initial fusion period in 
the visual numerosity function is determined 
by the duration of the scotopic response oc- 
curring at the onset of stimulation. Both the 
records show the response of the eye to a 
sequence of 10 flashes of light presented at a 
rate of 30 per second. In one case, however, 
the eye had been well dark-adapted and blue 
light was used, while in the other case the 
eye had been adapted to a relatively high level 
of surround illumination and red light was 
used as the stimulus. In the dark-adapted 
eye, blue light situation, ERG appears to con- 
sist of only a massive B wave, and the subject 
reported that he saw only a single bright flash 
of light. In the light-adapted eye, red light 
situation, on the other hand, the scotopic B 
wave apparently has been minimized, and the 


subject estimated that he had seen three 
flashes of light. 

The two examples just given represent the 
extreme situations, but probably give a fairly 
good idea of the possible range of durations of 
the initial fusion period. The ERG record 
shown in the upper part of Figure 9 repre- 
sents an intermediate viewing condition, with 
an intermediate scotopic response and there- 
fore presumably an intermediate duration of 
the initial fusion period. 

Later studies have tended to confirm the 
hypothesized relationship between the degree 
of scotopic activity and the duration of per- 
ceptual fusion at the onset of an intermittent 
light source. For example, in the study by 
White and Cheatham (1959) it was found that 
the fusion period was appreciably shortened 
when the level of adaptation was raised from 
1.8 ft-L to 1,000 ft-L. This effect can also 
be seen in the results of the present study. 
Figure 10 shows the first part of the nu- 
merosity functions obtained for both the sub- 
jects. These data were obtained with foveal 
stimulation and the high brightness stimulus. 
Surround brightness (adaptation level) is the 
parameter. The effect on the initial fusion 
period which is brought about by varying the 
level of adaptation is very obvious. 

The changes in the duration of the fusion 
period which have been noted in the visual 
numerosity studies are consistent with the 
changes one would expect in the magnitude 
of the scotopic response which would occur 
at the onset of stimulation under the various 
conditions which have been studied. The ERG 
records which have been discussed and illus- 
trated clearly show that there is a correlation 
here, at least under the extreme viewing condi- 
tions. The ERG records obtained under the 
intermediate condition suggest that the rela- 
tionship may actually be quite precise, so that 
the simple psychophysical task of reporting 
whether one or two flashes were perceived 
might be used to obtain a fairly accurate 
estimate of the degree of scotopic activity 
involved in the visual response under any 
given set of viewing conditions. There seems 
to be little doubt that a gross correlation 
exists, but further studies involving the simul- 
taneous recording of the retinal responses and 


TEMPORAL NUMEROSITY 23 


FOVEA 
FLASH BRIGHTNESS 8 


PERCEIVED NUMBER 


NUMBER OF FLASHES 


Fic. 10. Mean perceived number of flashes as a 
function of the number of flashes presented, with the 
Jevel of surround illumination as parameter. (Foveal 
stimulation, with stimulus presentation rate of 25 per 
second; stimulus diameter = 1° visual angle.) 


numerosity judgments will be necessary to 
determine just how fine this correlation 
actually is. 

If it is found that a precise relationship does 
exist, this could become a useful experimental 
or clinical screening technique, since defi- 
ciencies in the scotopic system, either arti- 
ficially induced or inherent, could be quickly 
detected without the need for complex elec- 
tronic equipment and the other associated 
devices. It would also preclude the need for 
lengthy dark adaptation when the time is not 
available to check a large number of subjects 
in this more traditional manner. 

In line with the preceding discussion, it is 
of interest to note that all of the differences 
found between the numerosity functions of 
the two observers in the present study would 
be predictable if it were to be learned that 
Observer ML’s B wave responses were signifi- 
cantly smaller than CW’s, perhaps indicating 


some deficiency in his scotopic vision. It is 
to be recalled that CW’s period of initial 
fusion varied over a considerable range of 
durations, becoming longer under these ex- 
perimental conditions which would be expected 
to increase the magnitude of scotopic response 
to the onset of stimulation. MUL’s initial 
fusion periods, on the other hand, varied over 
a much more limited range of durations. There 
was essentially no difference in the durations 
of the initial period of fusion for these two 
observers when foveal stimulation was used, 
but marked differences were apparent when 
the locus of stimulation was moved toward 
the periphery, where a greater scotopic re- 
sponse would be expected. 

According to this view the fact that ML 
perceived a greater number of flashes than 
did CW when a given stimulus train was pre- 
sented at a nonfoyeal location would be inter- 
preted as indicating that ML’s peripheral 
vision was less sensitive than CW’s. That is, 
the scotopic component in ML’s peripheral 
vision would appear to be less sensitive. If 
this were the case CW’s absolute threshold 
for peripheral stimuli would be lower than 
ML’s. This prediction seems to be contra- 
dicted by the results of Experiment I, in which 
it was found that ML could see the lowest 
intensity flashes in the high surround condi- 
tions with peripheral stimulation while CW 
could not. This, however, was a case of a 
differential threshold at photopic level, so 
perhaps no relationship should be expected 
between the two situations. 

This line of thinking leads to some interest- 
ing speculation concerning the interaction of 
the photopic and scotopic visual systems under 
various conditions. For example, in terms of 
extrafoveal vision could a sensitive or highly 
reactive scotopic system be detrimental to 
a person’s differential sensitivity at photopic 
levels? 


Comparison with Other Sense Modalities 


In addition to the studies on vision, whick 
have constituted the major portion of the work 
to date, the numerosity functions for the 
senses of audition and touch have also beer 
investigated (Cheatham & White, 1954; Whit 
& Cheatham, 1959). A close examination o 


24 Carrott T. WHITE 


the results of these studies shows that the nu- 
merosity functions for audition and touch 
differ from the visual function primarily only 
in the fact that they do not exhibit the phe- 
nomenon of initial fusion. This fact is inter- 
preted as indirect evidence in favor of the hy- 
pothesis concerning the initial fusion period 
which was discussed in the previous section. 

In place of the fusion effect, however, both 
the audition and touch functions show some- 
thing that is even more interesting. That is, 
with both these modalities the subjects were 
able to report correctly when two stimuli were 
presented at the rate of 30 stimuli per second, 
the highest rate used, but did not appear to 
be able to distinguish this from the situation 
in which three stimuli were presented. In 
other words, the subjects seemed to possess a 
high level of ability to distinguish the second 
stimulus from the first but were not able to 
distinguish the third stimulus from the second, 
although all three stimuli were equally spaced 
in time. As a matter of fact, starting with the 
second stimulus the rate of increase in the 
number of perceived stimuli was found to be 
about 12 per second for both audition and 
touch. It is to be recalled that this is the same 
rate of increase in perceived number that was 
shown by the visual numerosity function for 
the period of time immediately following the 
initial period of fusion. 

An interesting confirmation of the above 
finding, at least for the case of audition, is 
given in the following statement by G. Stanley 
Hall, as quoted by William James (1890, 
p. 614). Hall, who had been experimenting 
with a device which produced clicks in vary- 
ing number and at varying intervals, said: 

In order that their discontinuity may be clearly 
perceived, four or even three clicks or beats must be 
farther apart than two need to be. When two are 
easily distinguished, three or four separated by the 
same interval . . . are often confidently pronounced 
to be two or three respectively. 

In the series of articles which have been 
published on this topic and in the Results sec- 
tion of the present study the mean has been 
utilized for all the graphical representations 
of the data. This has been done in order to 
make the rate characteristics of the numeros- 
ity function more immediately apparent to 
the reader. There has always been a good 


deal of misgiving in this regard, however, 
because the mean values do not provide an 
adequate description of the actual nature of 
the results which have been obtained in this 
type of experiment. 

This fact became obvious during the course 
of the first study on visual numerosity (Cheat- 
ham & White, 1952) when it was found that 
there were certain number-rate combinations 
of stimuli to which all the subjects give the 
same response on all trials. This fact was, of 
course, as important as the number of per- 
ceived flashes reported by the subjects. 

An examination of the distributions of re- 
sponses in this study (as exemplified by 
Table I) and in the previously published 
articles on this topic will show that any dis- 
tributions which could be classified as “nor- 
mal” are definitely exceptions. In the present 
study, for example, this type of distribution 
of responses tends to occur mainly in those 
situations where the judgments were difficult 
to make, because of poor contrast conditions, 
etc. The great number of cases in which all 
the subject’s responses were the same, or 8 or 9 
of the 10 responses the same, would seem to 
indicate that perhaps the mode would be the 
most meaningful measure to use. This would 
seem to be especially true if we are more 
interested in critical points in time rather than 
only in rates. 

The use of means in some of the earlier 
publications on this topic has led to certain 
misconceptions regarding the form of the nu- 
merosity function, even on the part of the 
authors. For example, in the report on the 
auditory numerosity (Cheatham & White, 
1954) the use of means in plotting the func- 
tion relating the perceived number of sound 
pulses to the time required for the presenta- 
tion of the stimulus sequences created the 
impression that the rate of increase in the 
perceived number of pulses remained constant 
from the second stimulus onward, with no 
break in the function occurring at about 300 
milliseconds after the onset of stimulation, 
such as was shown in the visual function. In 
the tactual numerosity study (White & Cheat- 
ham, 1959) the longest duration of any stim- 
ulus sequence presented did not extend beyond 
300 milliseconds so no information in this 


TEMPORAL NUMEROSITY 25 


regard was obtained, but it was assumed that 
the function would continue on at the same 
rate also, because the tactual data that were 
obtained matched the corresponding auditory 
data almost exactly. 


A recent re-examination of the auditory 
numerosity data revealed something that 
should have been obvious from the time that 
the study was done. That is the fact that the 
distributions of the responses to stimulus 
sequences whose durations exceeded 300 milli- 
seconds are extremely skewed in the positive 
direction. This was clearly shown in the 
tabular results which were published along 
with the graphical functions. In going back 
to the original data books it was further noted 
that this extreme skewness of the data was 
created by the responses of only one of the 
five subjects. This particular subject seemed 
to be unable to make reliable judgments to 
the longer stimulus sequences, and began mak- 
ing responses which were completely out of 
line with his responses to the shorter 
sequences. The remainder of the subjects, 
however, continued to respond reliably, and 
there was good agreement among them in 
terms of the actual number of pulses per- 
ceived. In retrospect it is clear that in such 
a situation the mean was not the proper 
measure of central tendency to be used. 


As a result of the observations discussed in 
the preceding paragraphs it was felt that some 
additional insights into the nature of the 
visual numerosity function might be gained 
by considering the modes rather than the 
means of the response distributions. Because 
of the situation which had been encountered 
in the auditory numerosity study it was also 
felt that the relationship between the various 
sense modalities in this regard might be better 
understood if the modal responses were used 
as the basis of comparison. 


In Figure 11 the modal values obtained in 
several of the numerosity experiments have 
been presented together in order to see 
whether there seem to be any consistent pat- 
terns in the results other than those already 
discussed. The results included are those 
which were obtained with the 30 per second 
stimulus rate in the visual, auditory, and 
tactual studies previously reported (Cheat- 


ie 
è s A 
2 a 
aly 
> 
$a 
g 
L O visua (10 M SEC.FLASH) 
AD Visual. ( 5M SEC. FLASH) 

‘ 
Weed [SAO Mo tal Peon La Vi Faeo donb 
30SEC 12345676 9 Wil I2 13 14 15 W67 
e TE R Pee RIT EN CMC 


NUMBER OF STIMULI 


Fic. 11. Perceived number (modal values) as a 
function of the number of stimuli presented (and of 
the time required to present them), (Details in text.) 


ham & White, 1952; White & Cheatham, 
1959), and also the results obtained in Ex- 
periment III of the present study. Since all 
these modal values are plotted as a function 
of the time required for presentation, the fact 
that the latter data represent responses to a 
25 per second stimulus rate is not important. 

When these data are compared in this way 
some interesting relationships can be noted. 
For one thing it can be seen that the auditory 
and tactual modal responses are identical, at 
least for the range of stimulus-sequence dura- 
tions covered in the tactual numerosity study. 
It can also be seen, as mentioned previously, 
that the auditory numerosity function shows 
a definite “break,” or leveling-off, in the region 
of 300 milliseconds. Thus this critical point 
in time appears to hold for the other major 
senses as well as vision. 

A closer examination of Figure 11 also re- 
veals another interesting aspect of these data. 
That is that the differences in the absolute 
number perceived in the various modalities, 
or within the visual modality under different 
conditions, when a given number of stimuli are 
presented, seem to be because specific op- 
portunities for adding perceptual units tend 
to be missed in those situations wherein the 
absolute number perceived is smaller. The 
evidence for this conclusion may not be im- 
mediately apparent to the reader. This is 
not surprising, since it is not based entirely on 
the modal data presented in Figure 11, 
although an analysis of the results which are 
presented there will show that this seems to 
þe the case. Since this finding, if true, could 
be quite important, a more detailed discussion 
follows. 


26 CARROLL T. WHITE 


In comparing the results of the auditory 
study with the results of that visual study in 
which the absolute number of perceived units 
was the greatest for a given number of stimuli 
presented, a good example of this phenomenon 
can be found. (This particular visual study 
was the one in which a 1,000-ft-L adaptation 
level and a 10-millisecond flash duration were 
employed.) In the auditory results it can be 
seen that the critical time for the addition of 
a second perceptual unit has already occurred 
by the time the second stimulus was presented 
(at this particular rate of presentation). In 
the case of the visual results, however, it ap- 
pears that this particular opportunity for add- 
ing a perceptual unit was missed, probably 
because of the brief period of initial fusion 
that seems to exist even at this high level of 
adaptation. As the number of stimuli per 
sequence is increased it can be seen that sub- 
sequent perceptual units appear to be added 
to both modalities at about the same time 
after the onset of stimulation. That is, fol- 
lowing the initial fusion period the visual func- 
tion tends to parallel the auditory function, 
with perceptual units being added at the same 
rate and at about the same points in time fol- 
lowing the onset of stimulation. It is thus 
seen that the superiority of the auditory nu- 
merosity function, in terms of the absolute 
number of perceptual units reported by sub- 
jects in response to a given number of stimuli, 
is established at the very onset of stimulation, 
and this superiority is then maintained 
throughout the function—at least as far as 
we have the data for comparison. 

It is of interest to note that the absolute 
level of the visual numerosity function ob- 
tained in the earlier study exceeds those ob- 
tained in the present study. The only major 
difference between these studies is in the 
duration of the individual flashes used. As 
noted, in the earlier work each flash lasted 10 
milliseconds, while in the present study a flash 
tube was used which produced flashes lasting 
only a few microseconds. For present pur- 
poses it is only necessary to note that this 
difference in level appears to be due to the 
fact that in the present study, represented in 
Figure 11 by the results of Experiment III, 
the subjects missed a specific opportunity to 


add a perceived unit (about 200 milliseconds 
after onset of stimulation) and therefore their 
responses tended to be lower from that point 
in the function onward. In other words, it 
would appear that some aspect of the condi- 
tions of the present study precluded the addi- 
tion of a perceptual unit at a specific critical 
point in time following the onset of stimula- 
tion, a point in time at which a perceptual 
unit was added in the earlier study. 

The phenomenon being discussed was first 
noticed when the results of Experiment II of 
the present study were plotted. It was found 
that the numerosity functions for the red and 
the blue stimuli were practically identical up 
to a certain point in time after the onset of 
stimulation. At that point in time, however, 
the function for the blue stimulus indicated 
the addition of a perceptual unit, while there 
was no such increase in the function for the 
red stimulus. Following this the two functions 
tended to increase in an identical manner, 
with the red function now being below and 
parallel to the blue function. In other words, 
the two functions seemed to be identical 
except for the sudden increase exhibited by 
the blue function at a certain point in time. 

This completely unexpected finding seemed 
to indicate that there was something about the 
red stimulus which prevented the subjects 
from perceiving an additional flash at a 
specific critical point in time after the onset 
of stimulation, but which did not seem to have 
any noticeable effect at any other time. Such 
a finding, of course, led to much discussion 
and many observations under various view- 
ing conditions to try to determine if this was 
strictly a color effect or if some other factor 
might be involved. In the course of these ob- 
servations it was noted that the visual angle 
subtended by the stimulus seemed to affect 
the perceived number. This led to Experi- 
ment III. 

When the much smaller stimulus area was 
used the difference between the blue and the 
red functions no longer existed. With the 
small red stimulus the subjects were able to 
perceive an additional flash at that point in 
time at which the two functions had sepa- 
rated when the larger stimulus area was used. 
This seems to indicate that some sort of 


TEMPORAL NUMEROSITY 27 


spatial inhibition effect is set into operation 
as the visual angle subtended by the red stim- 
ulus is increased, but no such effect occurs 
in the case of a blue stimulus—at least for 
the range of stimulus areas sampled in these 
two exploratory experiments. Such a finding, 
if confirmed, could open an interestng new 
field for visual research. However, it does 
represent a digression from the theme of the 
present paper. The important thing, for the 
present, is that the effects brought about by 
such variations in viewing conditions seem 
to occur at specific points in time. An un- 
derstanding of the processes underlying such 
effects must come as a result of research di- 
rected specifically to that problem. 

On the basis of the findings made to date 
on this topic, and because of certain consider- 
ations discussed in the preceding paragraphs, 
it is assumed that the modal values of the 
auditory numerosity function probably pro- 
vide us with the best available estimates of 
the temporal characteristics of the central 
process underlying the perceptual phenomena 
which have been encountered in these studies. 
The rates of increase in perceived number 
and the various critical points in time ex- 
hibited in the auditory results also appear to 
apply to the other sense modalities studied. 
The results obtained in the study utilizing 
tactual stimulation were exactly the same as 
the auditory results in all these respects for 
the range of stimulus sequences employed, and 
we have no reason to believe that any differ- 
ences would be shown if longer sequences were 
to be used. 

The differences noted between the audi- 
tory results and the results of the various 
visual studies appear to be due to certain 
peripheral effects, the exact nature of such 
effects being determined by the viewing con- 
ditions and stimulus characteristics employed. 
Since the auditory numerosity function pre- 
sumably indicates the maximum number of 
sequential perceptual units available to a 
subject for any given duration of stimulation, 
it is seen that a comparison of the results of 
the auditory and the various visual numer- 
osity studies should provide us not only with 
information regarding the underlying central 


process but also with information concerning 
processes in the peripheral visual system. 


Perceptual “Shrinkage” and Polyopia 


Further evidence in favor of the discon- 
tinuity hypothesis comes from the work done 
with the “shrinkage” phenomenon. This phe- 
nomenon can be observed when an illumi- 
nated arc (of 36° extent, for example) is 
rotated at less than fusion speed while the 
eyes remain fixated on some spot, such as the 
axis of the rotating disc. Under these condi- 
tions the illuminated arc appears to shrink to a 
fraction of its actual length, under optimal 
conditions being seen as a mere point of light. 
Ansbacher (1944), after having performed a 
series of studies dealing with this phenomenon, 
showed, by means of an interesting analogy 
to photography, that he could explain such 
a phenomenon if he were allowed to assume 
the existence of a critical stimulation period 
and stroboscopic illumination. He then said: 

Actually, the stimulation in our experiments was 
continuous and still the existence of shrinkage has 
been demonstrated. It therefore becomes necessary to 
assume that physiological intervals break up the 
physical continuity, thereby creating the same physio- 
logical situation as would the hypothetical 80 ms. 
stimulation period. Thus, we advance the hypothesis: 
the visual mechanism is active for only a certain 
period, which is followed by another period of inac- 
tivity; in other words, the visual mechanism func- 
tions in pulsations [p. 13]. 


Ansbacher then pointed out that this was 
not such a radical idea as it might appear, 
referring to statements of earlier workers and 
to various physiological findings which seemed 
to indicate the cyclic nature of sensory sys- 
tems. He concluded this section of his paper 
by saying: à 

Since according to our theory pulsations transform 
real movements into discrete physiological “stills,” 
and since nevertheless we perceive movement, it fol- 
lows that the “stills” would enter perception re-trans- 
lated into movement, even as the stills in moving 
pictures (and all apparent movement) are translated 
into perceptual movement. Perception of real move- 
ment would thus rest on the same principle as per- 
ception of apparent movement, with the exception 
that the “stills” in the former would be of physio- 
Jogical rather than physical origin [p. 13]. 


It is worthwhile to compare these state- 
ments by Ansbacher with the quotation from 


28 CARROLL T. WHITE 


Bergson, which is included in the introduc- 
tion, regarding the “Cinematographical Mecha- 
nism of Thought.” 

Ansbacher’s hypothesized “discrete physio- 
logical ‘stills’ ” take on added interest in view 
of the following quotation from Teuber 
(1959): 

In impaired portions of defective visual fields, the 
perception of a continuous motion is frequently dis- 
sected into a series of multiple stationary images, 
quite analogous to the phenomena obtained for nor- 
mal observers in Brown’s experiments when the target 
speed exceeded certain values. (Thus, one patient 
with a gunshot wound of the right occipitotemporal 
region complained that when a motorcycle passed 
him to his left, he saw instead “a string of motor- 
cycles standing still.”) [P. 1645.] 


This phenomenon, called polyopia, also 

occurs during mescal intoxication. In the 
earlier stages of mescal intoxication (Schilder, 
1942) 
real movements are either not recognized at all or are 
split off into a succession of single interrupted per- 
ceptions. . . . Instead of a curve in which a lighted 
object is moved, a multiplicity of shining points is 
seen [pp. 33-38]. 
In Ansbacher’s terms, it would appear that 
some process responsible for the retransla- 
tion of the “physiological stills” into perceived 
movement is disrupted by certain types of 
cortical lesions and by the action of certain 
drugs. 

The phenomenon of polyopia is seen as 
providing evidence in favor of the discon- 
tinuity hypothesis, and at the same time it 
seems to indicate the presence of some inte- 
grative process underlying the perception of 
movement—indeed, perhaps underlying the 
apparent continuity of all sensory experience.® 
A statement made by Miller and Garner 
(1944) in regard to the quantal theory of 
discrimination seems appropriate at this time: 


8 Bergson (1913) stated: 

Intellect . . . always starts from immobility, as 
if this were the ultimate reality: when it tries to 
form an idea of movement, it does so by construct- 
ing movement out of immobilities put together. 
. . . When it substitutes for movement immobilities 
put together, it does not pretend to reconstitute the 
movement such as it actually is; it merely replaces 
it with a practical equivalent [p. 155]. 


There is no simple reconciliation to be made be- 
tween the discreteness of receptor cells and the seem- 
ing continuity of sensory experience, but in general 
there seems to be two ways of regarding this paradox. 
Either continuity of excitation must be demonstrated 
in nervous tissue, or discreteness must be shown in the 
apparent continuum of experience [p. 451]. 


Cortical Action Time 


Reaction-time studies have occasionally 
been utilized in an attempt to obtain an 
estimate of the minimum time consumed by 
“cerebral operations.” J. McKeen Cattell 
(1886) performed a study on visual reaction 
time in which he made such an attempt. 
By subtracting the time assumed to be taken 
up by neural transmission, chemical trans- 
formation in the retina, latency at myoneural 
junctions, etc., Cattell was left with a re- 
mainder of about 75 milliseconds which he 
could not account for by any of these means. 
He concluded that this represented the time 
taken up by cerebral operations. 

This study has been repeated by Monnier 
(1952). Monnier simultaneously recorded 
ERG, EEG, and EMG (from the finger with 
which the response was made). Given the 
information obtainable from these records, 
such as the start of the ERG response, the 
time of occurrence of the evoked cortical re- 
sponse and the start of the EMG response 
in the responding finger, it was only neces- 
sary for him to estimate the neural transmis- 
sion time from the cortex to the finger. From 
the values obtained in this way Monnier was 
able to establish that the minimum “opto- 
motor ‘integration time” was approximately 
75 milliseconds. He concluded: 

This time gives information on the approximate 
duration of the associative and integrative processes 


performed in the great surface net of the brain, con- 
sidered as a bridge from input to output [p. 485]. 


The possible relevance of these findings to 
the present topic is clear if it is remembered 
that the minimum time necessary for the ad- 
dition of each sequential perceptual unit dur- 
ing the initial segment of the numerosity 


function was estimated to be about 75-80 _ 


milliseconds (cycle times corresponding to 
the estimated maximum rate of increase of 
perceived units of 12-13 per second). 


TEMPORAL NUMEROSITY 29 


Perceived Flicker in the Absence of Intermit- 
tent Stimulation 


If it is indeed the case that there is a cyclic 
central process that has such a profound ef- 
fect on perception as is suggested by the 
findings up to this point, it is reasonable to 
assume that under certain conditions flicker 
might be perceived when, in fact, the visual 
stimulation is nonintermittent. 

A discussion of just such situations is to 
be found in a note by W. Nagel in the classic 
text on physiological optics by Helmholtz 
(1924, p. 311). It is pointed out that when 
one of the symptoms commonly associated 
with migraine is the so-called flicker scotoma, 
in which a portion of the visual field de- 
velops a pronounced flicker. This flickering 
area is rather small at first, but may gradually 
increase to include the entire visual field. 
It is referred to as a scotoma because the 
area affected is insensitive to any external 
visual stimulation. A somewhat similar phe- 
nomenon can also be produced with a normal 
subject if a slight pressure is exerted on his 
eye when he is in complete darkness. In 
both the above situations the apparent rate 
of flicker is judged to be between 10-15 flashes 
per second. 

Even if a subject is in complete darkness 
there is a certain level of spontaneous activity 
in the retina, resulting in a residual volume 
of neural impulses being transmitted to the 
central visual system. This spontaneous ac- 
tivity is sometimes suggested as the basis of 
the phenomenon of “cortical gray,” that is, 
the reported fact that a person does not per- 
ceive “blackness” when he is in complete dark- 
ness, but rather perceives a dim “grayness.” 
According to these suggestions true black can 
only be produced in situations where there 
are marked brightness contrast effects. In 
neurophysiological terms this would be in those 
cases in which some kind of spatial inhibitory 
effect is able to prevent even the spontaneous 
activity in certain retinal areas. If such is 
the case, perceptual black is to be considered 
as resulting from an active neural process, 
and not merely the absence of external stimu- 
lation. 

The slight pressure on the eye is presumed 
to increase the level of the spontaneous ac- 


tivity to a point that the cortical gray is 
more pronounced and hence the flicker might 
be more readily observable. Nagel states that 
he, among others with whom he discussed this 
matter, could observe this phenomenon oc- 
casionally even without pressure being ex- 
erted on the eyes. The pressure merely inten- 
sified the experience for him, but other 
observers could not experience this “physio- 
logical flicker” without the pressure stimula- 
tion. 

Nagel concludes his note on this topic by 
saying: 
As to the causes of this phenomenon, as also of the 
flicker in migraine, positive opinions can hardly be 
ventured at present [Helmholtz, 1924, p. 312]. 


On the basis of this interesting, but very 
indirect, evidence, it can be tentatively con- 
cluded that whatever central process is in- 
volved can produce the appearance of inter- 
mittency in the absence of intermittent 
stimulation, as well as limit the rate at which 
sequential perceptual units can be added when 
intermittent stimulation is applied to an 
observer. 


Apparent Rate Paradox 


If it is true that some central process deter- 
mines the rate at which sequential perceptual 
units may be added, why is it that the ap- 
parent rate of flicker for a given objective 
stimulus rate varies so markedly as a func- 
tion of the retinal locus of stimulation? 

The fact that the apparent rate of flicker 
does decrease markedly as the locus of stimu- 
lation is moved away from the fovea was first 
reported by Le Grand (1937), and can easily 
be verified by anyone who cares to observe 
this phenomenon. This has been the one find- 
ing that does not appear to fit the hypothesis 
that some definite central process acts to 
establish the maximum rate at which sequen- 
tial perceptual units can be added. This 
hypothesis, meant to apply to vision, audi- 
tion and touch, would appear to be on very 
shaky ground indeed if it did not hold for 
at least the major part of the visual field. 
The present study was undertaken primarily 
to obtain more information concerning the 


30 CARROLL T. WHITE 


numerosity functions for nonfoveal stimula- 
tion. It was hoped that this additional in- 
formation might provide some clue to the 
basis for the apparent rate phenomenon. 

The results have already been discussed 
in some detail. The important finding, in re- 
gard to the problem being discussed, was that 
after about the first 300 milliseconds the rate 
of increase in the perceived number of flashes 
was approximately the same for all the retinal 
loci stimulated. It is to be recalled that this 
same finding was reported by Forsyth and 
Chapanis (1958). 

The reason that such a finding was unex- 
pected is that prior to this time it had been 
assumed that the subjective rate was directly 
related to the slope of the numerosity func- 
tion. In fact, in the earlier papers on this 
topic the term subjective rate was operation- 
ally defined as the slope of the function. 

The fact that the rate of increase in per- 
ceived number tends to remain the same re- 
gardless of what part of the retina is stimu- 
lated is quite important, since it effectively 
counters a major objection to the central 
process hypothesis. The question posed at 
the beginning of this section can now be 
answered by saying that the apparent rate of 
flicker does not appear to be determined by 
the rate at which the sequential perceived 
flashes are added. 

Even after he knows that the rate of in- 
crease of the perceived number of flashes is 
the same for a given intermittent stimulus 
presented at the fovea and at a peripheral 
locus, it is difficult for an observer to believe 
it. When extreme retinal loci are stimulated, 
such as in Experiment II of this study, the 
resulting apparent rate seems to be of a com- 
pletely different order of magnitude than 
that which is experienced when the stimula- 
tion falls on the fovea. 

This paradoxical decrease in the apparent 
rate of flicker with peripheral stimulation, 
with no corresponding decrease in the rate at 
which perceived flashes are added to a flicker 
sequence, deserves further investigation. For 
the present, however, it can only be pointed 
out that these two “rates” are basically dif- 
ferent phenomena. 


Temporal Numerosity Phenomena and Neuro- 
physiology 


It has been suggested that studies dealing 
with the temporal numerosity phenomena 
should provide information which would be 
very useful to neurophysiologists interested 
in correlating neural and behavioral events. 
It is not the aim of this paper to try to ex- 
plain the neural bases of the phenomena which 
have been discovered. However, to quote 
Fessard (1959, p. 256), it would be “difficult 
and to a certain extent artificial” to end this 
discussion without at least pointing out cer- 
tain processes and events whose temporal 
characteristics are markedly similar to those 
exhibited by the temporal numerosity func- 
tion. 


It is not necessary here to present a com- 
plete review of the studies dealing with the 
gross neural processes which would most likely 
be related to our perceptual findings. Some 
good sources to consult if such a thorough 
coverage is desired are the following chapters 
in the Handbook of Physiology (Magoun, 
1959): “Intrinsic Rhythms of the Brain,” by 
W. Grey Walter; “The Evoked Potentials,” 
by Hsiang-Tung Chang; and “Central Mecha- 
nisms of Vision,” by S. Howard Bartley. 
Each of the above authors has approached 


the problem in a different way, so a rather _ 


broad understanding of the current state of 
knowledge in this area can be obtained by 
the careful study of these articles. It is of 
special interest to note that certain specific 
findings are discussed by all three authors, 
each from his own point of view. It so hap- 
pens that these findings deal with those proc- 
esses that would seem most likely to be related 
to the perceptual phenomena which have been 
discovered in the various studies on temporal 
numerosity. 

The characteristic of central neural activity 
that immediately suggests a possible relation- 
ship to the numerosity findings is the so- 
called “cortical excitability cycle.” When two 
stimuli are applied in sequence the amplitude 
of the evoked response to the second stimu- 
lus, recorded at the sensory cortex, varies as 
a function of the time between the two stimuli. 
This variation in amplitude is cyclic with a 


TEMPORAL NUMEROSITY 31 


regular periodicity. Chang (1959) described 
this phenomena as follows: 


Unique to the sensory area, the excitability state of 
the cortex does not always return to the normal level 
following the completion of the usual excitability 
cycle but undergoes a further cyclic waxing and 
waning with regular intervals. The periodic excita- 
bility change of the visual cortex was described by 
Bishop in 1933. . . . The periodic variation in excita- 
bility of the auditory cortex beyond the unresponsive 
period caused by a sound stimulus was observed by 
Jarcho. He noticed the periodic depression of the 
cortex at a frequency coincidental with the repetitive 
corticothalamic after-discharges. . . . [His] finding 
was soon confirmed with the further disclosure that 
in company with the rising and falling of the cortico- 
thalamic reverberating waves, there are concomitant 
increase and decrease of cortical excitability. . . . 
Though intimately related with the corticothalamic 
reverberating waves, the periodic variation of cortical 
excitability following an afferent stimulation may be 
manifested in the absence of detectable repetitive 
waves, At the onset of continuous illumination of 
the retina, for instance, the waxing and waning of 
the cortical excitability can be demonstrated even 
within the prolonged period of postexcitatory depres- 
sion during which the reverberating waves are not 
distinctly visible. The significance of the periodic 
excitability change of this kind is not known [pp. 
309-310].9 


Chang pointed out that the primary re- 
sponse of the sensory cortex to an afferent 
volley is often followed by a train of regu- 
larly spaced surface-positive waves with in- 
tervals ranging from 50 to 150 milliseconds, 
the exact rate depending on the state of 
anesthesia and the particular area of the 
sensory cortex involved. He cited a number 
of findings which indicate that these periodic 
after discharges represent the activity of the 


9 It is interesting to speculate about the relationship 
of this excitability cycle to the moment-to-moment 
variations in a subject’s threshold. Such a central 
process could account for an appreciable fraction of 
the variance which is found. Boynton, for example, 
feels that some such central process is probably in- 
volved in threshold variability. As a result of certain 
of his studies Boynton (1961) concludes that 

the sensitivity of the visual system is very pre- 

cisely specifiable at a given moment . . . [since 

he obtained] significant differences in the thresholds 
for test flashes separated by only two or three 

milliseconds [p. 752]. 

Blackwell (1963) has recently shown that certain 
apparent inconsistencies in visual threshold data “can 
be explained quantitatively by assuming the existence 
of a CNS scanning mechanism which ‘cycles’ six times 
per second [p. 157].” 


reverberating circuit between the sensory cor- 
tex and the thalamus. The interval between 
these reverberating waves is much too long to 
be accounted for by conduction time and the 
number of synapses in the corticothalamic 


‘circuit, so Chang suggested that the rate ex- 


hibited by the after discharges is probably 
actually established by the cortical excitability 
cycle. That is, it is probable that among the 
returning impulses only those which arrive at 
certain phases of the excitability cycle are 
capable of initiating synchronous discharges 
in the cortical neurons. Thus the rate ex- 
hibited by the after discharges in the sensory 
cortex is probably representative of the tem- 
poral characteristics of the cortical excitability 
cycle and not of the corticothalamic circuit 
itself. 

What is of particular interest here is the 
range of the intervals between adjacent after 
discharges which was reported, correspond- 
ing to cycle rates ranging from about 7 to 20 
per second. At another point in his review 
article Chang reported that the somatic sen- 
sory cortex is not able to respond fully to 
peripheral stimulation at a rate higher than 
7 per second in animals under barbiturate 
anesthesia, while under other conditions the 
rate may be as high as 14 per second (Chang, 
1959, p. 302). 

The rates quoted in the preceding para- 
graph are seen to be quite compatible with 
the rates of increase of perceptual units indi- 
cated by the numerosity functions—a maxi- 
mum of 12-13 per second during the initial 
segment and about 7 per second after the criti- 
cal 250-300 millisecond point had been passed. 
The only implication being made here is that 
the hypothesis that the temporal numerosity 
function may provide us with information re- 
garding the characteristics of a central process 
involved in certain aspects of perception ap- 
pears to be plausible, at least so far as the 
cycle duration of such a process is concerned. 

In his chapter in the Handbook of Physi- 
ology, Walter (1959) described a number of 
visual phenomena, some of which have al- 
ready been discussed in this thesis, which he 
believed can best be explained in terms of 
“interaction between rhythmic volleys of im- 
pulses in the visual pathways with the intrin- 


32 CARROLL T. WHITE 


sic scanning rhythms [p. 293].” He also 
briefly reviewed the studies which have at- 
tempted to relate motor behavior to central 
neural rhythms. The results of such studies, 
while tending to show such a relationship, 
are not as clear-cut as the results obtained 
in the sensory work.!° 

Walter pointed out that some of the con- 
flicting findings reported might be because 
of a confusion in terminology, especially in 
regard to the alpha rhythm. For example, 
the evoked cortical rhythm which follows 
sensory stimulation has many features which 
differ from those of the spontaneous rhythm 
which may have been exhibited prior to stimu- 
lation, although their rates may be the same. 
For one thing the evoked rhythms are time 
locked to the stimuli which bring them about, 
so such a rhythm may be out of phase with 
the rhythm existing at the time of stimula- 
tion, This complex interaction between stimu- 
lus impulses and cortical rhythms, wherein 
the first stimulus impulses appear to set in 
operation a process that then affects the cor- 
tical response to later impulses, would cer- 
tainly complicate any attempt to correlate 
peripheral motor events with central processes. 

For the present we must be content with 


10 The rate limitation in the various sense modali- 
ties, as shown by the temporal numerosity results, 
has an interesting analogue in simple reflex-arc ac- 
tivity. Sherrington (1906), in his classic book The 
Integrative Action of the Nervous System, pointed 
out that one of the differences between nerve-trunk 
conduction and reflex-arc conduction are the less 
close correspondence in the latter between rhythm 
of stimulus and rhythm of end effect. In nerve-trunk 
conduction impulses correspond closely to the stimuli 
applied in number and rhythm. With reflex arcs, 
however, one will find 

undulations of a frequency of 10-12 per sec. on 

myograms of spinal reflexes evoked by excitation 

of the afferent nerve by faradic currents of fre- 
quency much above 10-12 per sec. . . . The rhythm 
of discharge from the motor-cell, as far as the 
undulations noted indicate rhythmic response, are 
totally different in rhythm from that of the action 
induced in the afferent cell by the stimulation ap- 
plied. In the reflex centre the rhythm has been 
transmuted from one rate to another . . . [also] 

undulations at rates varying between 7.5 and 12 

per sec. are seen in the flexion-reflex . . . quite 

independently of the rate of delivery of the induc- 
tion shocks used as stimuli, and even when the 

stimulus is a constant current [pp. 42-43]. 


just emphasizing the fact that a great deal 
of confusion has resulted in the past by the 
failure of workers in this area to differentiate 
between the cyclic phenomena exhibited by 
EEG (such as the alpha rhythm) and the 
processes underlying these phenomena. En- 
tirely different processes may underlie what 
appear to be identical EEG records, for ex- 
ample, and the absence of a phenomenon in 
the EEG record (such as in alpha blockage) 
does not necessarily mean that the underlying 
process is not still active. The necessity of 
differentiating between spontaneous and 
evoked brain rhythms was one of the things 
emphasized by all three of the authors whose 
review articles are being discussed. Whether 
or not exactly the same neural pathways are 
involved in these two types of rhythms is a 
question which cannot as yet be answered, 
but the evidence to date seems to indicate that 
at least they do have some common elements. 
This question, however, goes far beyond what 
is relevant to our present problem. 

Bartley, as the title of the cited review article 
would indicate, concentrated on the results 
of neurophysiological studies involving various 
parts of the visual pathways. After bringing 
together the pertinent work on this topic he 
described a number of perceptual phenomena 
which he felt could be at least partially ex- 
plained in terms of certain of the neurophysi- 
ological processes he had discussed. We have 
already encountered many of the examples 
he used, such as the brightness enhancement 
phenomenon. 

One of the items he discussed, however, 
is of special interest because of its possible 
relationship to one of the characteristics of 
the numerosity function. In one of his studies 
Bartley recorded directly from the optic cor- 
tex of a rabbit while the animal was being 
stimulated by flashes occurring at a rate about 
twice its alpha frequency. The pattern of 
the cortical response to this type of stimula- 
tion was as follows: 

First, there was the typical evoked re- 
sponse to the onset of the stimulation. This 
was followed by a period of time during which 
the cortical activity was quite irregular. At a 
certain point in time following the onset of 
stimulation, however, the cortical activity as- 


TEMPORAL NuMEROSITY 33 


sumed a regular, cyclic appearance. Bartley 
assumed that the period of irregularity fol- 
lowing the initial evoked response was neces- 
sary for some cortical process to get into 
synchrony with the incoming bursts of im- 
pulses.** 

A finding reported by Siebert and the Com- 
munications Biophysics Group (1959) of the 
Massachusetts Institute of Technology seems 
to be analogous to that which Bartley reported. 
By means of an average response computer it 
was possible to obtain an indication of the na- 
ture of the activity in the human occipital cor- 
tex following the onset of visual stimulation. 
The results of this investigation can be described 
in almost the same words as were used to de- 
scribe Bartley’s results. Following the initial 
evoked response a period of irregular activity 
ensued. At a certain point in time after the 
onset of stimulation (from the records in- 
cluded in their report this appears to be about 
250-300 milliseconds) the cortical activity 
assumed a regular cyclic form. The frequency 
of this cyclic activity was approximately the 
same as the subject’s alpha rhythm, but was 
time locked to the onset of the stimulation. 
Thus it appears to have the characteristics 
of the evoked rhythms discussed at length by 
Chang. It should be pointed out that the 
sequence of activity here described was in 
response to a single bright flash of light, 
and not to rapid intermittent stimulation as 
was the case in Bartley’s study. Because of 
this it is tentatively assumed that the se- 
quence of events found in both types of study 
is set into action by the first stimulus of a 
series. It would be very instructive to see 
what changes, if any, would be found if simi- 
lar records were to be obtained from human 
subjects presented with rapid sequences of 
stimuli similar to those utilized in the tem- 
poral numerosity studies. 

The change in the nature of the cortical 
activity at a certain point in time following 


11 In some more recent work Bartley, Nelson, and 
Ranney (1961) have suggested that this “reorganiza- 
tion period” may be related to certain perceptual phe- 
nomena occurring at the onset of visual stimulation, 
such as the subject’s inability to detect the second 
flash of a stimulus pair separated by a short iterval. 
Tt can be seen that the concept of the “initial fusion 
period,” discussed earlier in this paper, represents a 
different (peripheral) explanation of this situation. 


the onset of stimulation, noted in the two 
studies just mentioned, reminds one of the 
critical point in time at about 250-300 milli- 
seconds following the onset of stimulation, 
which seems to be one of the characteristics 
of the numerosity function. It is to be re- 
membered that that point in time marked the 
beginning of a steady rate of increase in per- 
ceived number, this rate apparently being 
unaffected by the various parameters involved 
in the several studies. As was pointed out in 
the earlier section of this discussion which 
dealt with the form of the numerosity func- 
tion, this has been interpreted to mean that 
this point in time probably marks the onset 
of an important central process, or the point 
of transition from one basic process to an- 
other. The neurophysiological evidence just 
discussed would seem to indicate that such a 
hypothesis may not be too far wrong. 

In his review article Bartley (1959) stated: 

It may be said, then, that in brightness enhance- 
ment and in the group of findings in regard to the 
way in which the optic pathway is able to react to 
timing of input, we have one of the more fully docu- 
mented sets of the relationship between sensory be- 
havior (perception) and neurophysiology of the cen- 
tral nervous system [pp. 734-735]. 

It is suggested that the temporal numerosity 
technique may provide even more information 
regarding this relationship. In addition to the 
obvious relevance of the rates of increase in 
the perceived number indicated by the nu- 
merosity function the data obtained by this - 
technique are unique in terms of the very 
specific information regarding critical points 
in time, perhaps even being able to show 
rather precisely when each perceived flash is 
added in a flicker sequence, for example. 
This, added to the fact that it can be used 
equally well with the three major exterocep- 
tive senses, would indicate that the numerosity 
technique should become a valuable tool in 
future research on sensory physiology. 


Re-evaluation of the Temporal Numerosity, 
Perceptual Simultaneity, and Synchroniza- 
tion Techniques 


Bartley, in the review article discussed in 
the preceding section, pointed out that there 
are three basic stimulus timing patterns used 
in visual research. These are (a) single iso- 


34 CARROLL T. WHITE 


lated stimuli; (b) paired stimuli, in which 
the two members are separated by various 
times; and (c) trains of stimuli, or intermit- 
tent stimulation. He goes on to say that al- 
though all the stimuli may be identical the 
type of information you obtain about the 
visual system is quite different under these 
three different conditions. For example, (a) 
shows the response of a “resting” system; (b) 
shows the effect of the first stimulus on the 
second, or shows how long it takes the system 
to return to normal; and (c) gives information 
regarding the rate-handling capacity of the 
system and may involve very complex inter- 
actions at various stages in the system. 

In the introduction it was proposed that 
there were three experimental techniques that 
should provide information regarding percep- 
tual periodicity: temporal numerosity, per- 
ceptual simultaneity, and synchronization. At 
first it was hoped that all three techniques 
might give results that would be directly com- 
parable. A closer examination of what is in- 
volved in these different techniques, however, 
shows that we should not expect identical 
quantitative estimates of the duration of the 
moment from the results of such experiments, 
The situation is somewhat analogous to that 
pointed out by Bartley. 

In the first place, entirely different re- 
sponses are demanded of the subject. The 
temporal numerosity technique is much more 
structured than the perceptual simultaneity 
technique, for example, since the subject is 
required to make a definite judgment of the 
number of items he perceived. In the per- 
ceptual simultaneity experiment the subject 
is free to establish whatever criterion he 
chooses for simultaneity of the various forms 
or form segments. All we can honestly hope 
for here is that the span of simultaneity so 
established will be in the same general range 
as the period duration implied by the numer- 
osity results. The finding that a subject tends 
to exhibit about the same span of simul- 
taneity regardless of the number of forms or 
form segments being fused is much more sig- 
nificant than the actual duration of that span 
as indicated by the data. That such con- 
stancy in the duration of the span of simul- 
taneity does appear to be the case (Lichten- 
stein, 1961; Murphree, 1954) is strong 


evidence for the existence of a finite per- 
ceptual period. That the estimates of the 
span of simultaneity are very similar to the 
periodicity shown by the numerosity results 
tends to support the hypothesis that the two 
phenomena are related. 

The synchronization technique is quite an- 
other situation. Both the temporal numerosity 
and the perceptual simultaneity techniques uti- 
lize intermittent stimulation, while the syn- 
chronization technique does not. Although long 
sequences of flashes are presented to the 
subject these flashes are separated by such 
a long interval that they can be considered 
as individual flashes. In this type of study 
the subject is actually predicting the time of 
onset of the sequential flashes, following a 
practice period during which he has learned 
to tap at the proper rate. The variability in 
the timing of his taps does provide an index 
of his range of uncertainty for synchronizing 
with a recurrent stimulus, but this is not the 
same as the situation in which very rapid 
intermittent stimulation is used and other 
types of responses are called for. 

On the basis of the above discussion it is 
now felt that only the first two experimental 
techniques mentioned, temporal numerosity 
and perceptual simultaneity, are suitable for 
the study of perceptual periodicity. Of these 
two techniques, the second seems to provide 
the most dramatic support for the hypothe- 
sized psychological unit of duration, or mo- 
ment, discussed in the introduction. This is 
because the situation contains most of the 
elements and characteristics proposed for such 
an entity. That is, forms or form segments 
presented in a constantly repeated sequence 
are perceived as simultaneous, with no ap- 
parent movement between the various ele- 
ments, etc. 

Since the criterion of what is just simul- 
taneous is so flexible, however, it is felt that 
a better estimate of the temporal character- 
istics of this basic unit of duration would be 
given by the temporal numerosity technique. 
Therefore this technique should probably be 
utilized whenever an attempt is made to cor- 
relate these phenomena with neurophysio- 
logical processes or events. 

The assumption is being made here that 
the numerosity and simultaneity phenomena 


TEMPORAL NUMEROSITY 35 


are intimately related. It is to be recalled 
that the reasoning which gave rise to the 
concept of the moment would also predict 
some aspects of the numerosity results. So 
far nothing has been encountered which would 
cast serious doubt on such a relationship. 

The portion of the numerosity function 
relating to the perceptual simultaneity data is 
the last segment, that is, from about 300 milli- 
seconds on, which represents the rate of in- 
crease in perceived number after the initial 
transitional phase has been completed. This 
is because in the simultaneity studies the 
subjects were presented with constantly re- 
curring sequences of forms or form segments, 
and they did not make their settings until 
some time after the stimulation had com- 
menced, In terms of the discussion in the 
preceding section, the subjects made their 
settings after the evoked cortical rhythms 
had been well established. 

It can be seen that another advantage of 
the temporal numerosity technique is that it 
provides data starting at the very onset of 
stimulation, so that events during the transi- 
tional period can also be studied, that is, dur- 
ing what has been referred to as the initial 
segment of the numerosity function. 


SUMMARY 


The present study has provided some im- 
portant information about the visual numer- 
osity function. First, the results confirmed 
the previously reported finding that after an 
initial period lasting approximately 250-300 


‘milliseconds the slope of the numerosity func- 


tion was the same regardless of the retinal 
locus of stimulation (Forsyth & Chapanis, 
1958). It was also found that this held true 
over a wide range of viewing conditions, as 
long as the flashing light was clearly visible. 
This finding showed that it was necessary to 
make a distinction between the rate at which 
perceived flashes are added to a flicker se- 
quence and the apparent rate of flicker, since 
the latter varies with the locus of retinal 
stimulation. 

The second major finding was that when a 
difference was found in the responses as a 
function of stimulus conditions this difference 
appeared to manifest itself at a specific time 


after the onset of stimulation. This effect 
was noted only in the initial segment of the 
numerosity function. 

The implications of this last finding led to 
a re-evaluation of the previous study which 
had been performed on temporal numerosity, 
including the ones employing audition and 
touch. One thing that became obvious when 
all these results were examined was that the 
mode was a more meaningful measure of the 
subjects’ responses than was the mean. When 
the modal responses obtained in all these 
studies were compared, it was found that the 
results were strikingly similar. There ap- 
peared to be critical points in time after the 
onset of stimulation at which additional per- 
ceived units could be added to a perceptual 
sequence, and these critical points in time ap- 
peared to be the same for all three sensory 
modalities studied. 

Differences in the absolute response levels 
found between sense modalities and between 
different stimulus conditions in vision ap- 
peared to be due to the failure of the subjects 
to add an additional perceived unit at certain 
specific critical times in those experimental 
conditions yielding the lower functions. 

The marked similarity of the results ob- 
tained with the three different sense modalities 
was interpreted as giving strong support for 
the hypothesis that they all were affected by 
some common central process. It was as- 
sumed that the auditory (and tactual) nu- 
merosity function probably represented the 
best estimate of the temporal characteristics 
of this hypothesized process, the deviations 
shown by the visual function during the initial 
segment, such as the initial fusion period, 
being due to peripheral processes. 

The fact that the numerosity functions for 
all visual conditions studied had essentially 
the same slopes from a point in time about 
300 milliseconds after the onset of stimulation 
onward, plus the fact that the auditory func- 
tion also showed a change in slope at that 
point, led to the conclusion that that point in 
time marked the beginning of some central 
process that would determine the maximum 
rate at which perceived units could be added 
thereafter. More accurately, it was conceived 
of as a point of transition between two such 
processes, the first determining the maximum 


36 CARROLL T. WHITE 


rate of increase up to that critical point. The 
rates of these two hypothetical processes, 
based on the slopes exhibited by the numer- 
osity functions, were assumed to be about 
12-13 per second for the first and about 
6-7 per second for the second. 

Certain neurophysiological evidence regard- 
ing rhythmic processes in the brain was dis- 
cussed in order to establish the fact that a 
process (or processes) having the temporal 
characteristics suggested by the results of the 
temporal numerosity studies was quite 
plausible. 

The goals set for the present study have 
been attained. The results obtained with the 
various experimental conditions have led to a 
better understanding of the visual numerosity 
phenomena, and have tended to clarify the 
relationship between the different sense 
modalities in this respect. The results of this 
study, and their integration with the other 
findings and topics discussed, have strength- 


ened the basic hypothesis concerning per- 
ceptual periodicity and have indicated a pos- 
sible neural correlate for such periodicity. 

This paper began with a discussion of the 
“waxing and waning brain processes” hy- 
pothesized by William James and has con- 
cluded with a discussion of the waxing and 
waning brain processes studied and described 
by Chang and other neurophysiologists. The 
very persuasive and varied arguments in favor 
of some sort of perceptual periodicity, as 
briefly outlined in the introduction, have been 
matched by what is believed to be equally 
convincing evidence that such periodicity is 
indeed a reality. 

Regardless of whether or not it has all the 
characteristics attributed to it by its pro- 
ponents, the recurrent psychological unit of 
duration, or “moment,” is a very intriguing 
and useful concept. If nothing else, it pro- 
vides a fruitful point of common interest for 
psychologists and neurophysiologists. 


REFERENCES 


Anspacuer, H. L. Distortion in the perception of real 
movement. J. exp. Psychol., 1944, 34, 1-23. 

Barttey, S, H. The psychophysiology of vision. In 
S. S. Stevens (Ed.), Handbook of experimental psy- 
chology. New York: Wiley, 1951. Pp. 921-984. 

Barttey, S. H. Central mechanisms of vision. In 
H. W. Magoun (Ed.), Handbook of physiology. 
Vol. 1. Washington, D. C.: American Physiological 
Society, 1959. Pp, 713-740. 

Barttey, S. H., Netson, T. M., & Ranney, J. E. The 
sensory parallel of the reorganization period in the 
cortical response in intermittent retinal stimulation. 
J. Psychol., 1961, 52, 137-147. 

Bercson, H. Creative evolution. New York: Holt, 
1913. 

Biackwett, H. R. Neural theories of simple visual 
discriminations. J. Opt. Soc. Amer., 1963, 53, 129- 
160. 

Borne, E. G. Sensation and perception in the his- 
tory of experimental psychology. New York: 
Appleton-Century, 1942. 

Boynton, R. M. Temporal factors in vision. In 
W. A. Rosenblith (Ed.), Sensory communication. 
New York: Wiley, 1961. Pp. 739-756. 

CATTELL, J. McK. The time taken up by cerebral 
operations. Mind, 1886, 11, 220-242. 

Cuanc, H.-T. The evoked potentials. In H. W. 
Magoun (Ed.), Handbook of physiology. Vol. 1. 
Washington, D. C.: American Physiological So- 
ciety, 1959. Pp. 299-313. 


Cueatuam, P. G., & Wurtz, C. T. Temporal nu- 
merosity: I. Perceived number as a function of 
flash number and rate. J. exp. Psychol., 1952, 44, 
441-451. 

CueaTHaM, P. G, & Warre, C. T. Temporal nu- 
merosity: III, Auditory perception of number. 
J. exp. Psychol., 1954, 47, 425-428, 

Extincson, R. J. Brain waves and problems of psy- 
chology. Psychol. Bull., 1956, 53, 1-34. 

Fessarp, A. Brain potentials and rhythms: Introduc- 
tion. In H. W. Magoun (Ed.), Handbook of 
physiology. Vol. 1. Washington, D. C.: American 
Physiological Society, 1959. Pp. 255-259, 

Forsytx, D. M., & Cuapanis, A. Counting repeated 
light flashes as a function of their number, their 
rate of presentation, and retinal location stimulated. 
J. exp. Psychol., 1958, 56, 385-391. 

Francois, M. Contribution à Vétude du sens du 
temps: La température interne, comme facteur de 
variation de l’appréciation subjective des durées. 
Annee psychol., 1927, 28, 186. 

Hetmuottz, H. Helmholtz’s treatise on physiological 
optics. In H. Helmholtz, Handbuch der Physio- 
logischen Optik, Vol. 2. (3rd ed. trans. by J. P. C. 
Southall) New York: American Institute of Physics, 
Optical Society of America, 1924. Pp, 311-312. 

Hirss, I. J., & Suerrick, C. E., Jr. Perceived order 
in different sense modalities. J. exp. Psychol., 1961, 
62, 423-432, 

Hoactanp, H. Pacemakers in relation to aspects of 
behavior. New York: Macmillan, 1935. 


| 


TEMPORAL NUMEROSITY 37 


James, W. The principles of psychology. Vol. 1. 
New York: Holt, 1890. Pp. 605-642. 

Le Grann, Y. Sur le rhythme apparent du papillote- 
ment. Acad. Sci, Paris Comptes Rendus, 1937, 204, 
1590. 

LICHTENSTEIN, M., & Warre, C. T. Relative visual 
regular timing of components of the visual stimulus. 
Percept. mot. Skills, 1961, 12, 47-60. 

LICHTENSTEIN, M., & Warte, C. T. Relative visual 
latency as a function of retinal locus. J. Opt. Soc. 
Amer., 1961, 51, 1033-1034. 

LınpsLeEY, D. B. Psychological phenomena and the 
electroencephalogram. EEG clin. Neurophysiol., 
1952, 4, 443-456. 

Linpstey, D. B. Higher functions of the nervous 
system. Annu. Rev. Physiol., 1955, 17, 311-338. 
McReynotps, P. Thinking conceptualized in terms 
of interacting moments. Psychol. Rev., 1953, 60, 

319-330. 

Macoun, H. W. (Ed.) Handbook of physiology. 
Washington, D. C.; American Physiological So- 
ciety, 1959. 

MILLER, G. A., & Garner, W. R. Effect of random 
presentation on the psychometric function: Impli- 
cations for a quantal theory of discrimination. 
Amer, J. Psychol., 1944, 57, 451-467. 

Monner, M. Retinal, cortical and motor responses 
to photic stimulation in man (retino-cortical time 
and opto-motor integration time). J. Neurophysiol., 
1952, 15, 469-486. 

Munpy-Castiz, A. C. An appraisal of electroen- 
cephalography in relation to psychology. J. Nat. 
Inst. Personnel Res., Monogr. Suppl., 1958, No. 2. 

Murpuree, O. D. Maximum rates of form percep- 
tion and the alpha rhythm: An investigation and 
test of current nerve net theory. J. exp. Psychol., 
1954, 48, 57-61. 

Pace, H. J. An investigation of the relation between 
the perception of visual numerosity and critical 
threshold for flicker-fusion. Unpublished doctoral 
dissertation, New York University, 1957. 

Pitron, H. The sensations: Their functions, processes 
and mechanisms. London: Frederick Muller, 1952. 


Pitts, W., & McCuttocu, W. S. How we know 
universals: The perception of auditory and visual 
forms. Bull. math. Biophys., 1947, 9, 127-147. 

Scurtper, P, Mind: Perception and thought in their 
constructive aspects. New York: Columbia Univer. 
Press, 1942. 

SHERRINGTON, C. S. The integrative action of the 
nervous system. New Haven: Yale Univer. Press, 
1906. 

SIEBERT, W. M., & COMMUNICATIONS BIOPHYSICS 
Grour. Processing neuroelectric data, (Tech. Rep. 
351) Cambridge: Massachusetts Institute of Tech- 
nology, Research Laboratory of Electronics, 1959. 

Stroup, J. M. The moment function hypothesis. Un- 
published master’s thesis, Stanford University, 1948. 

Stroup, J. M. The fine structure of psychological 
time. In H. Quastler, Information theory in psy- 
chology. Glencoe, Ill.: Free Press, 1955. Pp. 174- 
207. 

Sweet, A, L, Temporal discrimination by the human 
eye. Amer. J. Psychol., 1953, 66, 185-198. 

Teuser, H. L. Perception. In H. W. Magoun (Ed.), 
Handbook of physiology. Vol. 3. Washington, 
D. C.: American Physiological Society, 1959. Pp. 
1595-1668. 

Watrer, W. G. The twenty-fourth Maudsley lecture: 
The functions of electrical rhythms in the brain. 
J. ment. Sci., 1950, 96, 1-31. 

Warrer, W. G. Intrinsic rhythms of the brain. In 
H. W. Magoun (Ed.), Handbook of physiology. 
Vol. 1. Washington, D. C.: American Physio- 
logical Society, 1959, Pp. 279-298. 

Warre, C. T., & CueatHam, P, G. Temporal numer- 
osity: IV. A comparison of the major senses. J. 
exp. Psychol., 1959, 58, 441-444. 

Warre, C. T., CHEATHAM, P. G., & ARMINGTON, J. C. 
Temporal numerosity: II. Evidence for central fac- 
tors influencing perceived number. J. exp. Psychol., 
1953, 46, 283-287. 

Wiener, N. Cybernetics. New York: Wiley, 1948. 
Pp. 165-166. 

WoopwortH, R. S. Experimental psychology. New 
York: Holt, 1938. 


(Received April 16, 1963) 


Vol. 77, No. 13 


Whole No. 576, 1963 


Psychological Monographs: General and Applied 


TEMPORAL DISCRIMINATION AND THE 
INDIFFERENCE INTERVAL: 


IMPLICATIONS FOR A MODEL OF THE “INTERNAL CLOCK” * 


MICHEL TREISMAN 
Institute of Experimental Psychology, University of Oxford 


Temporal discrimination was investigated 


duction, constant stimuli, single stimuli, 


by the methods of production, repro- 
and estimation. The Weber function 


was found to give a good fit to the relation between AT and T; evidence was 
obtained that a dip in the Weber function at a short interval is not an essen- 


tial feature of time estimation but may be 
modes of response or other factors. Productions or 
tend to lengthen during the course of a session at a rate 


due to the development of rhythmic 
reproductions of intervals 
proportionately greater 


for short intervals; but estimates tend to shorten. A model for the “internal 


clock” is described, based on a pacemaker, 
functional relations tending to reduce error, 
planations for the Weber function, the it 


counter, store, and comparator with 
and it is shown to provide ex- 
indifference interval, overestimation of 


short and underestimation of long intervals, the features of “lengthening,” assimi- 


lation, and other findings. 


ESPITE the long history of research on 

the estimation of time, some of the 
oldest problems in this field, such as the ap- 
plicability of Weber’s law to the difference 
threshold function or the causation of the 
“indifference interval,” are still unsolved. This 
is unfortunate since an accurate knowledge of 
the input-output relations shown by subjects 
estimating time is essential for the develop- 
ment of a satisfactory model of the under- 
lying time keeping mechanisms. The experi- 
ments described in this monograph attempt to 
clarify some of the features of time estima- 
tion; a model is then developed and the pre- 
dictions it makes are compared with the ex- 
perimental data. 

Does Weber’s law apply to the temporal 
difference threshold function? Glass (see 
Woodrow, 1930) examined intervals of A-15.4 
seconds and found that it did, but most later 
investigators have disagreed with him. In a 
number of cases the Weber fraction has been 
found to have a minimal value at some short 
interval: Mach obtained a minimum of 5% 
at .375 second; Vierordt obtained a minimum 
of 3% between 1 and 1.5 seconds (Boring, 
1942); and Woodrow (1930) obtained a mini- 


1] would like to thank R. C. Oldfield for his en- 
couragement and for the provision of research facili- 
ties, and M. Bulmer for reading the manuscript. 


mum of 7-8% at .6 second, the Weber frac- 
tion rising to 10-11% at .2 second and to 
16-17% for intervals between 4 and 30 sec- 
onds. Gilliland and Humphreys (1943) 
found that error, expressed as a percentage, 
increased as temporal duration decreased’ be- 
tween 9 and 180 seconds; Henry (1948) ob 
tained a similar change in the Weber fraction 
for intervals of .480-.032 second; and Fraisse 
(1948) reported that for ranges of .2-1,5 and 
3-12.0 seconds the Weber fraction was maxi- 
mal towards the middle of the range of times 
presented and decreased at the extremes. © 
The constant errors of estimates of temporal 
durations are usually positive for short in- 
tervals and negative for long ones, with a 
transition point, the indifference interval, at 
which the constant error is zero. In the early 
studies varying values were obtained for this 
interval, but the most common was a duration 
of .6—.8 second. Wundt considered this a spe- 
cific physiological unit of time, the duration 
required by the process of association or for 
the movement of a leg in rapid walking. Hol- 
lingworth (1913) explained the overestima- 
tion of short, and underestimation of long, 
intervals by his principle of “the central tend- 
ency of judgment”; he argued: 
In all estimates of stimuli belonging to a given range 
or group we tend to form our judgments around the 


2 MICHEL TREISMAN 


median value of the series—toward this mean each 
judgment is shifted by virtue of a mental set corre- 
sponding to the particular range in question [p. 47]. 


This hypothesis is supported by the occur- 
rence of overestimation of the shorter, and 
underestimation of the longer, intervals in a 
number of experiments in which considerably 
different ranges of intervals have been used. 
The term “overestimation” is sometimes held 
to be ambiguous. In this paper the term will 
refer to the case in which the constant error 
is positive. ‘Underestimation” will corre- 
spond to a negative constant error. 

In previous studies Fraisse (1948) used 
ranges of .2-1.5 and .3-12.0 seconds, which 
gave indifference intervals of 1.138 and 3.65 
seconds, respectively; Turchioe (1948) used 
-78-1,39 seconds; Gilliland and Humphreys 
(1943) used 9-180 seconds; Clausen (1950) 
used 5-15 seconds; Hirsh, Bilger, and Death- 
erage (1956) used 1-16 seconds; and Swift 
and McGeoch (see Weber, 1933) used from 
30 seconds to 10 minutes. All obtained indif- 
ference intervals in these experiments where 
the subjects judged a range of durations. 
Woodrow (1934) used a different group of 
subjects for each interval in a range, so that 
mo subject was exposed to more than one 
interval. Despite this he obtained the classi- 
cal pattern of over- and underestimation, with 
an indifference interval at .6 second. Fraisse 
(1948) attempted to reconcile this observa- 
tion with Hollingworth’s hypothesis by sug- 
gesting that when the subject is not presented 
with a range of intervals he will show an 
indifference interval determined by the com- 
bined effect of the durations “perçus dans 
toutes les occasions de la vie.” He accounts 
for Woodrow’s value of .6 second by assuming 
that only intervals between .4 and 2 seconds 
are “perceived.” 

The first experiment to be described here 
was intended to provide information for use in 
analyzing certain experiments on visual and 
auditory thresholds in which time was an in- 
dependent variable, and this intention largely 
‘determined its design (Howarth & Treisman 
1958; Treisman, 1962b; Treisman & Howarth, 
1959). However its results are relevant to 
both the problems discussed ‘above. 


EXPERIMENT 1 


In this experiment the (a) method of 
reproduction (MR) and (b) method of pro- 
duction (MP) were used. In the first a stand- 
ard time interval (T,) is presented to a sub- 
ject who makes a reproduction (7;) of it; in 
the second the T, is named verbally by the 
experimenter, but is not presented immedi- 
ately before each production (Tp). For con- 
venience T, will be used to represent the stand- 
ard interval in both procedures, whether it is 
presented as a physical duration or named. 
(A glossary of symbols is given in the Ap- 
pendix.) 


Apparatus and Procedure 


The T,’s were presented to the subject by means 
of a continuous visual stimulus, the light from a 
neon bulb. The time intervals were produced by 
electronic circuits employing Mullard Z803U valves. 
The durations of T., T», and Tp were recorded to the 
nearest millisecond by Dekatron counters. 

Part A. MR was used in this part of the experi- 
ment. The subject sat in front of a white screen, 
18 x 20 inches in size, with the neon bulb at its center. 
At the start of each trial the experimenter said 
“Coming” and approximately a second later closed a 
switch which immediately switched the neon light 
on for the preselected duration of T,. At the end 
of this interval the light went out for 2 seconds, 
then came on again and stayed on until the subject 
depressed a reaction key which turned it off. This 
interval was recorded as Tr. 

The values of T, used in each session were .5, 1, 
2, 3, 4, and 9 seconds. A series of eight trials with 
T. =.5 second was followed by a series of eight 
trials with T, =1 second, and so on until all six 
intervals had been presented. Save for a short initial 
practice period, each session was made up of three 
such “blocks,” each consisting of six series of trials, 
one series for each value of T.. The series were al- 
ways given in the order above. Before each series 
the subject was told how long T, would be. 

Part B. MP was used here. The procedure was the 
same as in Part A save that T, was omitted on each 
trial. Before each series of trials the subject was told 
what time interval to produce. On each trial the 
experimenter’s initial warning was followed by the 
light coming on. The interval, until it was extin- 
guished by the subject depressing the reaction key, 
was taken as Tp. 

The subjects were undergraduates and research 
students. Eight did one session each in Part A. 
Three subjects, two of whom had served in Part A, 
did one session each in Part B. 

The mean values of T, produced by the apparatus 
were .505, 1.007, 2.000, 3.004, 4.010, and 9.001 sec- 
onds. As the standard deviations of the values of 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 


T, were only 1-2% as great as those of Tr and Tp 
they will be disregarded. When values of T, pro- 
duced in the third block of each session are compared 
with those in the first block there is a mean decrease 
of 3%. 


Results 


The mean T, or Tp corresponding to each 
value of T, was calculated for each block and 
for each session. The means for each part of 
the experiment are given in Table 1. The 
standard deviation was calculated for each 
series and the mean values are given, Changes 
in T, or Tp from Block 1 to Block 3 were 
calculated as 100(T;—T71)/Ts, where T, and 
T, are the mean judgments for Blocks 1 
and 3 for each session, and their mean values 
for certain sessions are also given, The con- 


(T,— Ti) / T: 
are shown in Figure 1. 

The Weber fractions were calculated as 
percentages (100AT/T), with the mean stand- 
ard deviation taken as the measure of the 
just noticeable difference, AT, and the mean 
production or reproduction for the whole ses- 
sion taken as T. Since it is often found that 
the linear generalization of Weber’s law, 
AT=k (T+a), where k and a are constants, 
gives a better fit to difference threshold data 
than does Weber’s law in its classic form 
(Ekman, 1959; Gregory, 1956), the best-fit- 
ting function of this form (it will be referred 
to as the “Weber function”) was calculated for 
the results of each part of the experiment. The 
functions obtained were: 


stant error of the mean of each series of pro- for Part A: 
ductions or reproductions was expressed asa AT = .068(T+1.268) [1] 
proportion of the standard time interval; the 
7 for Part B: 
mean values of the proportionate constant 
errors AT = .057(T+1.770) [2] 
PCEs = (T,—Ts)/Ts for the combined data: 
or AT = .061(T+1.571) [3] 
TABLE 1 
è 
Basıc DATA FOR EXPERIMENT 1 

Experiment 1A: T, (seconds) 

mean repro- 5 

duction 5 1 2 3 4 
Block 1 555 940 1.802 2.836 3.914 8.545 
Block 2 593 1.137 1,971 3.008 3.974 + 8.837 
Block 3 625 1,105 2.037 3.086 4.130 9.055 
Session 591 1,061 1,937 2.977 4,006 8.812 
Mean SD -104 125 235 322 376 661 
Weber fraction 17.6 11.8 12.1 10.8 94 15 
Mean increase in Ty from Block 1 to Block 3 (all sessions) 

Second 070 165 235, .250 .216 510 
ho 14 18 13 10 5 6 
Experiment 1B: 

mean pro- 

duction 
Block 1 616 1.210 2.017 3.446 4.486 10.420 
Block 2 -709 1.211 2.397 3.399 4.526 10.715 
Block 3 725 1.454 2.370 3.611 4.625 10.554 
Session 683 1.292 2.261 3.485 4.546 10.563 
Mean SD .120 .207 .230 324 324 715 
Weber fraction 17.6 16.0 10.2 9.3 7.1 6.8 
Mean increase in Tp from Block 1 to Block 3 (two sessions showing lengthening) 

Second -206 327 .602 560 431 —.164 
% 34 30 34 18 10 -2 


T, (seconds). 


Fic. 1. PCEs for Blocks 1, 2, and 3 of Experiments 
1A and 1B are plotted against 7,. (The theoretical 
curves were obtained by fitting Equation 13. For 
Blocks 1, 2, and 3 of 1A they are: PCE = 
—.087-+.082/T., PCE = —.036-+.120/T,, and PCE 
= —.017+-.130/T,; and for Blocks 1, 2, and 3 of 
1B they are: PCE = .102+.064/T,, PCE = .117+ 
.139/T., and PCE = .148+176/T,.) 


Each regression was highly significant (p < 
.001) on analysis of variance, and there was 
no significant deviation from linearity of re- 
gression, When the data were combined the 
regression lines for productions and reproduc- 
tions did not deviate significantly from the 
common regression line. 


Discussion 


The results of this experiment do not show 
a dip, or a maximum, in the curve relating 
AT to T. In each part of the experiment the 
results fit a Weber function, and departures 
from this function are not significant. In- 
spection of the individual data did not sug- 
gest that there was any tendency towards 
systematic departure from the regression line. 

The classical tendency for short intervals to 
be overestimated, and long intervals under- 
estimated, is shown by the mean T,’s. A simi- 
lar effect is shown by the T,’s; here there is 
no underestimation, but the overestimation of 
the shorter intervals is proportionately greater 
than that of the longer intervals. The “cen- 
tral tendency” explanation would explain 
these findings as an effect of the range of 
intervals presented; it would follow that as 
the subject grows more familiar with the 
range during the course of the session the 


MICHEL TREISMAN 


overestimation of the shorter, and underesti- 
mation of the longer, intervals should become 
increasingly apparent, and the indifference 
interval should tend toward the median of the 
intervals presented. When, however, the re- 
sults for the successive blocks are compared a 
tendency is shown for the productions or 
reproductions of both shorter and longer time 
intervals to increase, so that all the constant 
errors tend to get larger during the course of 
the session, as shown in Figure 1. The pro- 
portionate increase is greater for the shorter 
intervals than for the longer; in consequence 
the indifference interval increases during the 
course of the session. In Part A it lies be- 
tween .5 and 1 second in Block 1, but is 
greater than 9 seconds in Block 3. Thus the 
changes predicted by the central tendency 
explanation are not shown. 

The increase in the mean judgments during 
the course of the session, which will be 
referred to as the “lengthening effect,” was 
significant, on analysis of variance, for both 
the reproductions (p<.001) and the produc- 
tions (p < .01), and the greater increase at 
the shorter intervals was significant (p < .05) 
for the reproductions. Lengthening was shown 
in 10 of the 11 sessions; the mean increases 
were 25, 17, 11, 10, 9, 9, 5, and 4% in Part A 
and 27, 15, and —5% in Part B. No similar 
change was shown by the standard deviations. 
Eson and Kafka (1952), who required sub- 
jects to produce 15-second or 2-minute inter- 
vals four times under varying conditions, and 
Falk and Bindra (1954), who obtained 30 
productions of 15 seconds from their subjects, 
have reported a similar tendency for Tp to 
increase. 

Because the intervals were not presented in 
counterbalanced order, two of the findings 
might be artifactual. If subjects were more 
variable in their performance early in a ses- 
sion, this might have produced a spuriously 
large increase in the Weber fractions for the 
shorter intervals, and so might have obscured 
a “dip” in the curve relating to AT to T. 
Similarly, if the rate of lengthening were the 
same for all intervals, but the effect was 
greater at the beginning of a session, then the 
proportionately greater increases shown by the 
shorter intervals might be due only to their 
having always come first. In the next experi- 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 3 5 


ment intervals are presented in counter- 
balanced order. 


EXPERIMENT 2 


The failure to find a minimum in the Weber 
fraction curve at a short interval might be 
due to the lack of a sufficiently short interval 
in the range used; therefore, T,=.25 sec- 
ond was added in the present experiment. 
On the central tendency hypothesis a shift in 
the range should cause a corresponding shift 
in the indifference interval; to test this, the 
range was restricted to intervals between .25 
and 3 seconds. To exclude any possibility 
that lengthening was due to the use of a visual 
stimulus, this was replaced by a tone. To see 
whether in a series of sessions the lengthening 
would change systematically, like a “practice 
effect,” or whether it would recur in similar 
fashion, like “fatigue,” each subject was ex- 
amined a number of times. 


Apparatus and Procedure 


Time intervals were presented as the duration of a 
continuous auditory stimulus—a 500-cycles-per-sec- 
ond, 60-70 decibel sensation level tone from a Muir- 
head audio-oscillator, given through Brown, Type K, 
moving coil earphones. For the rest the apparatus was 
the same as before. The experimenter started each 
trial by switching on a neon light which served as a 
warning. After 2 seconds the tone came on; it con- 
tinued for the preselected duration of T, (.25, .5, 1.0, 
or 3.0 seconds) and was followed by a silent interval 
which varied at random between .5 and 2.5 seconds. 
The tone then came on again; it continued until the 
subject stopped it by depressing the reaction key. 
When the method of production was used T, was 
omitted, the tone coming on .5-2.5 seconds after 
the light and continuing until stopped by the sub- 
ject. The subject, who was alone in a moderately 
illuminated, sound shielded room, sat facing the 
screen in which the warning light was mounted. 

Four subjects each did five sessions. After a short 
preliminary practice session four sessions were given 
on successive days, each lasting 50-60 minutes. Each 
session consisted of 16 series each of 10 trials. The 
value of T, and the method (MR or MP) was fixed 
for each series. The subject was told before each 
series what interval he was to produce or reproduce. 
The order of time intervals was counterbalanced both 
within and between sessions. For each time interval 
an MR series was followed by an MP series, or the 
reverse, the order being constant in a given session. 
Thus for one subject the first session consisted of 
the following 16 series: .25 MR, .25 MP, 5 MR, 
.5 MP, 1.0 MR, 1.0 MP, 3.0 MR, 3.0 MP, 3.0 MR, 
3.0 MP, 1.0 MR, 1.0 MP, 5 MR, 5 MP, .25 MR, 


.25 MP. His later sessions began successively with 
3.0 MP, 3.0 MR, and .25 MP. A second subject had 
the same sessions in reverse order, and the other 
two subjects had sessions beginning successively with 
3.0, .25, .25, and 3.0 seconds. 


Results 


The results, which were treated as before, 
are given in Table 2. The best-fitting Weber 
functions were determined for the productions 
and reproductions. These were: 


MR 
AT = .069(T+.543) [4] 


MP 
AT = .074(T+.624) [5] 


Each regression was significant on analysis of 
variance (p < .001). Deviations from line- 
arity of regression were significant (p < .05) 
for MP and not significant for MR. There 
was some inhomogeneity of variance which 
was ignored in making these analyses. The 
mean Weber fractions for each subject are 
shown in Figure 3, together with those for 


experiment 4. 


TABLE 2 
Basic DATA FOR EXPERIMENT 2 


Experiment 2; mean Ts (seconds) 
reproduction 25 5 1.0 3.0 
Block 1 291 849 1100 2918 
Block 2 312 590 1.117 2.959 
Session -302 569 1.108 2.938 
Mean SD 051 077 124 237 

Weber fraction 169 13.5 11.2 8.1 


Mean increase in Tr from Block 1 to Block 2 (13 ses- 
sions showing lengthening) 


Second 027 057 021 058 
% 10 11 3 2 
Mean production 

Block 1 .294 .544 1.226 3.112 
Block 2 -305 .563 1.175 3.202 
Session 300 553 1.200 3.157 
Mean SD 051 086 161 271 
Weber fraction 170 156 134 8.6 


Mean increase in Tp from Block 1 to Block 2 (9 ses- 
sions showing lengthening) 

Second 045 -136 117 

% 19 33 12 6 


6 MICHEL TREISMAN 


Discussion 


The relation of AT to the mean reproduc- 
tion or production is again adequately de- 
scribed by the linear generalization of Weber’s 
law; despite the addition of Ts = .25 second 
no dip is found in the Weber fractions as T, 
decreases. The Weber fractions for each sub- 
ject in each session showed some variability, 
but the only suggestion of any tendency for a 
dip to recur at a particular interval was a 
small decrease at .25 second (MP) shown in 
three sessions by one subject. 

The standard deviations and Weber frac- 
tions are smaller than those obtained in Ex- 
periment 1. This appears to result from a 
practice effect: Weber fractions calculated for 
the first session alone are very similar to 
those obtained before (see Table 5). The 
standard deviations decreased in successive 
sessions to an extent which was significant 
on analysis of variance (p< .01). They 
were also significantly larger (p < .01) with 
MP. No effect of blocks on standard devia- 
tion was found, 

PCEs are again greater at the shorter inter- 
vals, (The large PCE for 1 second—MP—is 
mainly due to one subject.) This effect was 
significant on analysis of variance for three of 
the subjects (p<.001 in each case) but not 
for the fourth. The indifference interval lies 
between 1 and 3 seconds for MR and is 
greater than 3 seconds for MP. It does not 
shift downwards as compared with Experi- 
ment 1, as Hollingworth’s central tendency ex- 
planation or adaptation-level theory (Helson, 
1959) might predict. Despite the differences 
in procedure there was no clear difference be- 
tween the mean T,’s and Tp’s obtained. 

Lengthening was again found, the average 
increases being 8, 7, 5, and —2% for the four 
subjects, the first two being significant 
(p < 001 and p< .05). The lengthening 
varied in an irregular fashion from session to 
session. For two subjects it was most marked 
in the first and last sessions, for one subject 
in the first two sessions, and for the remaining 
subject in the last two sessions. With MR a 
mean increase was shown in 13 of the 16 ses- 
sions; with MP in 9. An effect shown by all 
the subjects was a tendency for the mean PCE 
to increase in successive sessions (p < .001, 


.025, .05 and .1 for the different subjects), 
the mean values, expressed as percentages, in 
the successive sessions being 9, 11, 14, and 
16%. A similar increase in the constant errors 
in the second of two MR sessions can be seen 
in data reported by Turchioe (1948). 

The proportionate increases are greater at 
the shorter intervals. This was significant 
(p < .05) on analysis of variance of the in- 
creases in all the sessions. A possible explana- 
tion for it is that the rate of increase is con- 
stant, but is combined with a central tendency 
effect of the range. Another possibility is that 
fatigue develops during a session slowing down 
the subject’s response, when he terminates Tp 
or T,, by a constant amount. This would ap- 
pear as a proportionately greater increase in 
the shorter intervals. The next experiment 
examines these explanations. 


EXPERIMENT 3 


To exclude range effects, only one value of 
T, was used in each session. To see whether 
any slowing of the motor response as such 
occurred, reaction times were recorded. Oléron 
(1952); Hirsh, Bilger, and Deatherage 
(1956); and Goldstone, Boardman, and 
Lhamon (1959) have described effects of 
auditory intensity on time judgments. To 
examine this, three different intensities of the 
tone were employed. 


Apparatus and Procedure 


The apparatus was set up as in Experiment 2. At 
the beginning of each session the subject’s absolute 
threshold for the 500-cycles-per-second tone was de- 
termined by the method of limits. In the session, 
proper, 180 trials were given, divided into 6 identical 
blocks, each consisting of 3 series of 10 trials. The 
stimulus tone was at constant intensity for each 
series of 10 trials, intensities of 20-, 50-, and 80-deci- 
bel sensation level being used for the 3 series in each 
block. In each session the order in which they were 
given was the same for each block. When reaction 
times were measured the subject was asked to de- 
press the key as soon as the tone started. On these 
trials the warning light was followed, after an inter- 
val varying randomly between .5 and 2.5 seconds, by 
the onset of the 50-decibel tone. Twenty reaction 
times were recorded at the beginning of each session, 
after the first half session (3 blocks), and at the end 
of the session. 

Eight subjects were used, three being research stu- 
dents and five Air Force men. Four subjects had 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 7 


MR only and four MP only; four had T; = .5 sec- 
ond and four had T, = 3.0 seconds; four had the 
stimulus tones in the order 20, 50, and 80 decibels in 
each block and four in the reverse order, each combi- 
nation occurring once. With T, = .5 second sessions 
lasted 30-40 minutes, and with 7,=3.0 seconds they 
lasted 40-50 minutes. In MP sessions the subject was 
given two or three presentations of T, before the 
session proper commenced. 


Results 


The mean values of T, and Tp in successive 
blocks are given in Table 3. The individual 
results and the means for each tone intensity 
for T, =.5 second are shown in Figure 2. 
The percentage increases in T, or Tp in the 
second, as compared with the first, half session 
are given for each subject in Table 3. Signifi- 
cance levels for the effect of blocks on the 
means were p < .001 for T,=.5 second, and 
p<.1 for T, = 3.0 second. Mean reaction 
times for all subjects at the beginning, middle, 
and end of the session were .266, .263, and 
.263 second, respectively. 


Discussion 


The changes from the first to the second 
half sessions are proportionately greater for 
the shorter Tẹ. They are mainly positive, 
though one subject showed a consistent de- 
crease in his reproductions. Disregarding sign, 


TABLE 3 
Basic DATA FOR EXPERIMENT 3 


Block 
ze 1 2 sind 5 6 
.5 second 
MR 492 491 496 497 508 514 
MP 395 364 «322. «542 «576525 
3.0 seconds 


MR 2.882 2.906 2.993 2.998 2.909 3.018 
MP 2.336 2.199 2.207 2.260 2.297 2428 


Increases in Tp or Tr in second half session (%) 


Session T. (seconds) 
essio: 3 30 
MR 15, —11 3, 0 
MP 87, 22 5, 2 
Mean PCE for each session (7%) 
MR 11, —11 —2, —2 
MP 444, so = 23; =25 


Fic. 2. Part A shows the mean productions and 
reproductions for each subject for each block for 
T. = 5 second in Experiment 3. Part B shows the 
mean T, or Tp for each intensity of the stimulus tone 
for Ts = .5 second. Part C shows the means for each 
subject for T, = 3.0 seconds. 


the magnitude of the changes is significantly 
greater when Ts = .5 second ($ = .028, two- 
tailed, on the Mann-Whitney U test). The 
mean reaction times do not increase during 
the session, as might have been expected if 
lengthening were due to motor fatigue (the 
largest individual increase in reaction time was 
shown by the subject whose reproductions de- 
creased). The results of this experiment con- 
firm that proportionately greater changes oc- 
cur at short intervals, as found in Experi- 
ments 1 and 2, and suggest that this cannot be 
attributed to an effect of the use of ranges of 
intervals. 

For 7,=3.0 seconds the different intensi- 
ties of the tone had no detectable effect on 
the judgments made, but for T, = .5 second 
there was a highly significant effect (p < .001) 
which was shown by all the subjects at this 
interval: as the intensity of the tone increased 
the mean length of 7; or Tp decreased. This 
finding is discussed later. 


8 MICHEL TREISMAN 


EXPERIMENT 4 


No evidence of a systematic tendency for 
the Weber fraction to decrease to a minimum 
at a short interval has been found, though 
there have been intermittent reports of such 
dips for nearly a century. This suggests that 
their occurrence might not be a necessary con- 
sequence of the mode of operation of the basic 
mechanism measuring time but an artifact 
of occasional features of the experimental 
situation. In the experiments reported here 
the subject has been required only to termi- 
nate an interval, but studies which have shown 
minima have frequently had him delimit an 
interval by making two taps (Boring, 1942; 
Woodrow, 1930). This suggests that dips 
might be attributable to the mechanics of 
response. When a subject must make two 
responses at a fairly long interval they are, 
presumably, independently organized; but 
when the interval is short a “rhythmic” double 
response with much reduced variance might 
be produced. This might occur if the neural 
control of movement and the mechanical and 
physical properties of the subject’s limbs and 
of the device he is required to manipulate 
interact to provide more reliable cues to time 
than does the basic time keeping mechanism 
at these short intervals. If rhythmic responses 
have this property, they might occur in some 
form and be used as an ancillary cue in any 
time estimation procedure, but their use would 
be favored when two similar responses are 
required. To test the possible importance of 
this factor an experiment was designed in 
which the subjects had both to begin and end 
Tp or T;, making a similar response each time. 


Apparatus and Procedure 


The same apparatus was used as in Experiment 2, 
modified so that the subject had to tap a lightly 
sprung reaction key both to begin and end T, or Tp. 

Part A. Each subject did two sessions: half the 
subjects had MP in the first session and MR in the 
second, and the other half had them in the reverse 
order. An MR session consisted of four, and an 
MP session of six, blocks each made up of four series 
of 10 trials. 7, was constant for each series of trials: 
in “ascending blocks” T, was .25, .5, 1.0 and 3.0 sec- 
onds in successive series; in “descending blocks” the 
order was reversed. Ascending and descending blocks 
were given alternately in each session. Half the 
subjects started each session with an ascending block 


and half with a descending block. Sessions lasted 
30-40 minutes. 

Four subjects were used: one had served in Ex- 
periment 1 and one in Experiment 2; the other two 
were naive. 

Part B. Two of the subjects were each given a 
further MP session. The tone was omitted so that 
each interval was marked out only by a click at the 
beginning and end, produced when the subject tapped 
the reaction key. Each session consisted of four 
blocks, each block containing five series of trials, 
T: being .1, .25, .5, 1.0 and 3.0 seconds, in that or 
the reverse order. The sessions lasted 30-40 minutes. 


Results 


The mean values of T,, Tp, and their stand- 
ard deviations are given in Table 4 for both 
parts of the experiment. The Weber fractions 
for each session are plotted in Figure 3. 


+t 


Discussion 


The aim of this experiment was to see 
whether an increase in the factors favoring 
rhythmic responding would produce minima 
in the Weber fraction curve. In Figure 3 
AT/T is plotted against T, or Tp for Ex- 
periments 2 and 4. The 8 curves from Ex- 
periment 2 show one dip; five of the eight 
sessions in Experiment 4A show dips at short 
intervals, the Weber fraction reaching a mini- 
mum at a short interval in three of them. 
This difference might be at least partly due 
to the latter curves being based on fewer 


Te or Tp (Seconds) 


Fic. 3. The Weber fractions, expressed as percent- 
ages, of each subject in Experiments 2, 4A, and 4B 
are plotted against the corresponding mean produc- 
tions or reproductions. 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 9 
TABLE 4 
Basic Data IN EXPERIMENT 4 
Experiment 4A Tai (Seconds) 
.25 > 10 3.0 
Mean Tr .267 536 1.029 2.660 
Mean SD .040 065 110 .216 
Mean Weber fraction 15.0 12.3 10.8 8.2 
Mean Tp 426 -760 1.243 3.049 
Mean SD .060 -105 149 -303 
Mean Weber fraction 13.6 14.2 12.6 10.3 
Experiment 4B Taieecondi) 
10 25 5 1.0 3.0 
Mean Tp 255 465 729 1.344 3.103 
Mean SD 027 068 107 198 378 
Mean Weber fraction 10.7 14.2 15.6 14.7 12.1 
Parts A and B 
Increase from Block 1 to Block 4 (Six sessions showing lengthening) 
Second 128 247 434 463 
f 38 47 56 19 


observations; for Experiment 2 each point is 
based on 80 trials, and for 4A on 40 (MR) 
or 60 (MP). If in Experiment 2 mean Weber 
fractions are calculated from successive pairs 
of sessions, so that each point is based on 40 
trials, nine dips appear in the resulting 16 
curves ( mean extent 2.3%), but there are 
still no minima. Further evidence that the 
minima found here are not due solely to in- 
creased variability is provided by Table 5, 
which allows the mean Weber fractions for 
the first sessions of Experiments 1, 2, and 4A 
to be compared. In the last experiment the 
mean Weber fractions are reduced, and this 
is mainly at the shorter intervals. 

The difference from Experiment 2 was 
mainly due to Subjects D and M. It seemed, 
therefore, of interest that both the other sub- 


jects volunteered the information that they 
had been “concentrating on the tone” at the 
short intervals. This suggested that the use 
of filled intervals might not be optimal for the 
development of rhythmic patterns of response 
for some subjects. For this reason in Exper- 
iment 4B these two subjects were each 
given an MP session in which unfilled inter- 
vals were produced. For each subject a mini- 
mum Weber fraction at the shortest interval 
now appeared, (It may be of interest that 
Subject A, who showed a dip at 1 second in 
each MP session, reported that 1 second was 
“easy” because she could use a simple move- 
ment of the whole forearm in making the 
responses.) These results are in accord with 
the hypothesis that features of the experi- 
mental procedure which appear likely to favor 


TABLE 5 
COMPARISON oF WEBER Fractions (First Session ONLY) OBTAINED IN EXPERIMENTS 1, 2, AND 4A 


4 MR (T, seconds) MP (T, seconds) 
Esperia 25 5 10 3.0 28 3 1.0 3.0 
1 176 118 10.8 17.6 16.0 93 
2 21.1 16.8 14.6 9.9 16.8 170 16.5 112 
4A 10.2 12.4 114 94 149 114 114 9.7 
5 153 120 119 76 214 123 15.6 6.1 


10 MICHEL TREISMAN 


rhythmic responding may cause minimal val- 
ues of the Weber fraction to appear at short 
intervals. 

The usual algebraically larger values of 
PCE are found at the shorter intervals, with 
an indifference interval between 1 and 3 sec- 
onds for MR, and above 3 seconds for MP. 
The mean values of PCE for each session, 
expressed as percentages, were 4A MR: 7, 6, 
—1, —7; MP: 85, 83, 5, —25; 4B MP: 95 
and 34%. Like Experiment 1, but unlike 
Experiment 2, larger constant errors are given 
by MP than by MR. 

Lengthening was shown by all subjects in 
the first session, by one subject in the second 
session, and by one subject in 4B. The mean 
increases from the first to the last blocks 
were 4A MR: 21, 13, —1, —6; MP: 114, 
84, 49, —31; 4B MP: 38 and —2%. 


EXPERIMENT 5 


In Experiment 3 some evidence that length- 
ening was not due to an increase in motor 
response time was obtained, and Experiment 4 
showed that decreases in the Weber fraction 
might be due to the development of rhythmic 
response patterns. It thus seemed of interest 
to examine temporal discrimination when 
motor responses are avoided by the use of 
methods requiring comparisons. In this ex- 
periment the method of constant stimuli 
(MCS) and the method of single stimuli 
(MSS) are used. In MCS two time intervals, 
a standard (7,) and a variable (Ty), are 
given in that order and the subject must 
decide whether the second is longer (L) or 
shorter (S) than the first. In MSS T, only is 
presented, but the procedure is otherwise the 
same. 


Apparatus and Procedure 


The same apparatus was used as in Experiment 2, 
modified to allow two preselected intervals to be 
presented in succession. In MCS each trial started 
when the warning light came on; 2 seconds later T, 
was presented; and after a 2-second interstimulus 
interval it was followed by T». MSS trials were 
similar save that only Te was presented. The subject 
responded by pressing one of two keys marked 
“longer” and “shorter.” 

Each session was made up of two identical blocks, 
each consisting of four series of 40 trials. Values of 
T, were assigned to the four series in a block in one 


of four orders: .25, .5, 1, and 3 seconds; 3, 1, .5, 
and .25 second; .5, .25, 3, and 1 second; or 1, 3, .25, 
and .5 second. The same order was used in the two 
blocks of each session. Each subject did two sessions, 
having one order in the first session and its reverse 
in the second. In each series T, took one of five 
values equally often in a random order determined 
from random-number tables. These five values (they 
will be referred to as A, B, C, D, and E, where A is 
the shortest, Æ the longest, and C = Ts) were equally 
spaced, the step interval being 12.5 milliseconds for 
T. = .25 second, 25 milliseconds for T, = .5 second, 
50 milliseconds for Ts = 1 second, and 125 millisec- 
onds for T, = 3 seconds. 

In Part A MCS was used, and each session lasted 
about 1% hours. In Part B MSS was used, and each 
session lasted about 1 hour. In MSS sessions T, was 
presented once or twice before each series to reduce 
the apprehension some subjects felt about their ability 
to do the task. In each part of the experiment eight 
sessions were obtained, two from each of four sub- 
jects. Seven subjects were used, one serving in both 
parts of the experiment. Two subjects had served 
in Experiment 4. 


Results 


Probit analysis (Finney, 1952) was used to 
determine the point of subjective equality 
(PSE) and standard deviation for each series 
of trials. The results are shown in Table 6 
and in Figure 4. A Weber function was fitted 
for each part of the experiment. The best- 
fitting functions obtained were: 


MCS 
AT = .075(T+.501) [6] 


MSS 
AT = .057 (T+ .900) 7] 


Each function is significant (p < .001 in each 
case). There were no significant deviations 
from linearity of regression for Equation 6. 
Departures from linearity of regression were 
significant (p < .05) for Equation 7. In these 
analyses of variance some inhomogeneity was 
ignored, 


Discussion 


The relation between the standard devia- 
tions and PSEs fits the linear generalization 
of Weber’s law; the similarity of Equations 6 
and 7 to Equations 4 and 5, despite the dif- 
ferences of procedure in the two experiments, 
is encouraging. However examination of the 
Weber fractions calculated for each half ses- 
sion showed considerable variability. In 16 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 11 
TABLE 6 
Basıc DATA FROM EXPERIMENT 5 
Experiment 5A, MCS: mean PSE Ex Gesonde) 
25 j$: 1.0 3.0 
Block 1 .250 481 957 2.889 
Block 2 .248 491 966 2.855 
Session 249 486 962 2.872 
Mean SD 033 056 136 247 
Weber fraction 13.2 12.4 14.0 8.3 
Mean increase in PSE from Block 1 to Block 2 (all sessions) 
Second — 002 010 009 — 034 
% -1 3 1 -1 
Experiment 5B, MSS: mean PSE 
Block 1 * .252 494 997 3,043 
Block 2 248 495 997 2.985 
Session .250 495 997 3,014 
Mean SD .040 069 133 211 
Weber fraction 174 13.6 14.2 6.6 
Mean increase in PSE from Block 1 to Block 2 (all sessions) 
Second — .004 .001 0 — 058 
To -1 0 o —2 


MCS half sessions minimal Weber fractions 
occurred at 3, 1, .5, and .25 second 9, 2, 3, and 
2 times, respectively. For the 16 MSS half 
sessions the corresponding figures were 12, 2, 
1, and 1. For comparison, MR and MP data 
for each session of Experiment 2 provide 32 
curves, and the corresponding figures for their 
minima are MR: 15, 1, 0, 0; MP: 13, 2, 0, 
and 1, 

One source of this greater variability may 
be the relatively small number of stimuli 


5(a)-MCS. 


PCE. 


T, (seconds). 


Fic. 4. PCEs for the first and second blocks of 
Experiments 5A and 5B are shown. 


which could be given in each series if sessions 
were not to become too long. Under these 
conditions a short period of “random respond- 
ing” may greatly increase the standard devia- 
tion for the series. It is also possible that 
some subjects used regular or rhythmic move- 
ments as an ancillary mechanism reducing 
their variance at certain short intervals. Some 
subjects reported that they “mentally drew 
lines” when listening to the intervals; possibly 
this was accompanied by eye movements, A 
third possibility is that responses were influ- 
enced by sequential dependency effects. 
Sequential response dependencies would oc- 
cur if each response were determined not only 
by the duration of T, and its relation to the 
preceding presentation of T, in MCS, or to 
the corresponding “internal standard” in MSS, 
put also by previously presented variable 
stimuli, or the responses made to them. The 
presentation of extra stimuli has been found 
to produce “assimilation” or “contrast” effects 
on psychophysical judgments in many modali- 
ties (Guilford, 1954). Assimilation corre- 
sponds to a shift of the internal standard or 
PSE towards the extra stimulus; contrast is 
the opposite effect. Philip (1947) has de- 
scribed such effects with time judgments. He 


12 MICHEL TREISMAN 


used MCS, with T, = 1.01 second and dura- 
tions of .78 second or 1.39 second, as the extra 
stimuli. If these were given before T, or 
were interpolated between T, and T,, they 
produced assimilation, i.e., PSE was raised if 
the extra stimulus was greater than T, and 
lowered if it was less, Turchioe (1948), using 
MR with standard intervals of .78-1.39 sec- 
ond, also obtained assimilation when an extra 
stimulus was interpolated between T, and T,, 
but when it preceded T, it produced contrast 
instead. Goldstone, Lhamon, and Boardman 
(1957) and Goldstone, Boardman, and 
Lhamon (1959) measured PSE for 1 second 
by the method of limits, no standard being 
presented, and found that it shifted in the 
direction of a preceding anchor. It is thus of 
interest to see whether in the present experi- 
ment judgments were affected by previously 
variable stimuli. 

To see whether sequential dependencies 
were present, P(L), the probability of the 
response longer, was calculated for each sub- 
ject for Ty = C when the preceding T, took 
the value A, B, D, or E—P4,9(L), etc. These 
results are shown in Table 7. It can be seen 
that four subjects, Subject A in Part A, and 
Subjects G, M, and J in Part B, show appar- 
ent assimilation, the mean of P4o(L) and 
Pz,o(L) considerably exceeding the mean of 
Pp,c(L) and Pp,o(L). Chi square, calculated 
for each of these subjects and combined, was 
significant (p < .005). This could be an ef- 
fect of preceding stimulus presentations as 
such, but it could also be explained as a tend- 


TABLE 7 


ANALYSIS OF SEQUENTIAL DEPENDENCIES IN 
INDIVDUAL SUBJECTS 


Subject Preceding Ts 


Experiment 5A, MCS 


T 33 50 -69 74 
M 58 50 62 48 
E 63 -82 -76 -60 
A :74 62 58 41 


Experiment 5B, MSS 


400 
in 
lo} 
in 
A 
& 
& 
be 
oo 


TABLE 8 


ANALYsIs OF SEQUENTIAL DEPENDENCIES FOR 
Four SUBJECTS 


Preceding T» 
A B D E b 


Preceding response 


S OA ES OL 
L OEO SARS T 01 


ency to alternate responses, since A and B 
will more usually have evoked the response 
shorter and D and E longer. To examine this 
Po(L) was calculated separately for preced- 
ing S and L responses for each preceding value 
of T, for these four subjects taken together. 
The combined data are given in Table 8. The 
effect is clearly present when the preceding 
response is controlled, so that it can be at- 
tributed to the stimulus as such. If the pre- 
ceding response has an effect, Table 8 sug- 
gests that it is in the direction of persevera- 
tion, since for each preceding stimulus value 
P,(L)>Ps(L), but this difference was not 
significant. 

Subject T shows an increase in Po(L) as 
the preceding T, increases. This was only 
clearly present in the responses at T, = .25 
or .5 second. For these two intervals, as T, 
increased from A to E, Po(L) was successively 
27, 41, .68 and .87 (p < .005). However 
the data were insufficient to determine whether 
this was “contrast” or “response persevera- 
tion.” 

Assimilation would cause the internal stand- 
ard corresponding to recent or remote presen- 
tations of T, to vary as the length of the 
preceding T, varies. This would effectively 
add to the response variance, A different ex- 
planation for the changes in Po(L) shown by 
these four subjects could be offered. They 
may have been following a decision strategy 
in which a given presentation of T, is com- 
pared both with T,, or the internal standard, 
and with the immediately preceding T,, and 
the larger difference determines the response. 
This would tend to increase P(L) for D and 
E, and decrease it for A and B, and would 
thus decrease the standard deviation of the 
responses. These two possibilities cannot be 
distinguished on the basis of the present 
data. If either or both of them occur, appear- 


—— 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 13 


ing intermittently or only at certain values of 
Ts, they would tend to inflate or reduce the 
variance and would thus contribute to the 
appearance of dips and minima in the Weber 
fraction curve. Together with the possible 
ancillary use of rhythmic movements they 
might help to explain the occasional occur- 
rence of minima with the method of compari- 
son (Blakely, see Woodrow, 1951). 

The constant errors in this experiment differ 
somewhat from those seen before. The in- 
crease in PCE as Ts shortens which is usually 
found is shown to a small extent by MCS 
data, but not when MSS was used. In MCS 
results an indifference interval at about .25 
second is suggested, but MSS constant errors 
are so small that an indifference interval can- 
not be plausibly located. There is also no 
evidence of lengthening, the increases in PSE 
from Block 1 to Block 2 being negligible. 
Mean increases for the individual subjects 
were 5A: 2,1,0, —1; 5B: 2,2, —2, and —4%. 

These findings might be attributable to the 
use of a fixed range of variable stimuli. This 
would give the subject “knowledge of results,” 
since any change in PSE would produce a 
change in P(L) and P(S). If, when this 
occurs, the subject makes a compensatory ad- 
justment to preserve the original proportions 
of the two responses, he will effectively pre- 
vent lengthening from manifesting itself, and 
will tend to preserve the initial constant errors 
and indifference interval. To the extent that 
he tends to make P(L) = P(S) = .5 constant 
errors will tend to disappear altogether. This 
strategy might be favored when the standard 
is not presented; if so, this would explain the 
smaller constant errors with MSS. 

Another explanation for the absence of 
lengthening might be that it is produced by a 
slowing of the motor response terminating 
T, or Tp, so that when this response is not 
employed the time measuring mechanisms 
reveal their underlying stability. 


EXPERIMENT 6 


This experiment was designed to show 
whether lengthening would appear if the 
knowledge of results which subjects might 
obtain from changes in P(L) was reduced. 
This was done by adjusting the range of 


variable stimuli as an estimate of the subject’s 
PSE changed. To prevent the experimenter’s 
bias or expectations in any way contributing 
to the result these adjustments were made in 
accordance with a fixed decision procedure pre- 
scribed in advance. 


Apparatus and Procedure 


The same apparatus was used as in Experiment 5. 
Each subject did two sessions, half the subjects hay- 
ing a modified MCS in the first session and a modified 
MSS in the second, and the other half the reverse. 
Each session consisted of 360 trials conducted as in 
Experiment 5; T, was always .5 second. In MSS 
sessions the standard interval was presented two or 
three times before the session began, but not again. 

Values of T» separated by 5% steps were used. 
The initial range was A = 454, B = 476,C = .500. 
D = .525, and E = .551 second. In each successive 
set of five trials Stimuli A-E were presented in a 
random order determined with the aid of random- 
number tables. After each set of five stimuli had 
been given if 4-5 L responses had been made the 
range was shifted down by one step; if 4-5 S re- 
sponses had been made it was shifted up by one step; 
otherwise it was left unchanged. The subject did not 
know that the range might vary, and pauses between 
successive sets of five stimuli were avoided. 

Eight subjects were used. One had served in Ex- 
periment 4; the others were naive. MSS sessions 
lasted 50-60 minutes; MCS 65-75 minutes. 


Results i 

The midpoint (C) of the range of stimuli 
used for each set of 5 trials was taken as an 
approximate estimate of PSE, and the means 
of the range midpoints determined by the first 
and last blocks of 40 trials were calculated 
for each session. The increase between these 
two values for each session was: 


MCS 
21, 17, 4, 4, 1, 1, 1, and —4% 

MSS 

27, 15, 10, 8, —1, —4, —4, and —47% 
The mean PCE, expressed as a percentage, 
for each session, was MCS: 8, 7, 6, 2, 2, 1, 
—4, —6; MSS: 11, 10, 9, 6, 6, —1, —5, and 
—28%. 


Discussion 


Despite the device of shifting the T, range 
it would still be possible for a subject to pre- 
vent any lengthening being shown by record- 


14 MICHEL TREISMAN 


ing every response and correcting completely 
for any deviation from P(L) = .5. But the 
redundancy of the information available to 
the subject has been reduced; if, for example, 
he did not record all his responses, or did not 
fully correct for deviations, then some degree 
of lengthening might be shown. A limitation 
of the range midpoint as an estimate of PSE 
is that its maximum rate of change is one 
step interval every five trials, so that it will 
lag behind any more rapid change in PSE. 
Despite these deficiencies, six sessions showed 
lengthening exceeding 5%. Thus the stability 
found in Experiment 5 appears to be an arti- 
fact of the use of a fixed range of variable 
stimuli, and it is-clear that a terminal motor 
response is not essential for lengthening to be 
shown. 

The changes in the range midpoint varied 
considerably between subjects and sessions. 
The most common patterns were repeated slow 
oscillations; a rise, which might be early, 
late, or sustained through the course of the 
session; or an initial fall with a subsequent 
tise. One subject showed a marked sustained 
fall in her MSS session. 


EXPERIMENT 7 


In the procedures used so far we could sup- 
pose that the subject reproduces an internal 
standard, or compares it with the variable 
stimulus, and the changes found in Tp, Tp, or 
PSE would then directly reflect changes in 
these standards. A procedure sometimes em- 
ployed in the study of time discrimination is 
the method of estimation (ME) in which a 
time interval, T., is presented to the subject 
who is required to estimate its length, verbally 
or in writing. Here we could suppose that the 
estimate (Æ) identifies the internal standard 
corresponding to Te. If this is so 7, will 
correspond to T», T,, or Tp, rather than to T,. 
This argument, which is developed more fully 
later, leads to an interesting prediction which 
the present experiment is designed to test. 
This is that the underlying changes that are 
responsible for lengthening of T,, Tp, or PSE 
during the course of a session should lead, 
when ME is employed, to a decrease in the 
estimates of a given T.. We might also expect 
the usual pattern of constant errors to be 


reversed, underestimation being greater at 
shorter intervals. 


Apparatus and Procedure 


The apparatus used was the same as in Experi- 
ment 5. On each trial the onset of the warning light 
was followed 2 seconds later by the presentation of 
Te. The subject had to estimate Te to the nearest 
tenth of a second, record his E on a form provided, 
and then press a reaction key. This signaled to the 
experimenter that the subject was ready for the next 
interval, which was presented in the same way. 

Each subject did two sessions. Before the first ses- 
sion examples of intervals ranging between .1 and 8 
seconds were presented to the subject. Each session 
consisted of 180 trials, divided into five blocks, each 
block consisting of three series of 12 trials each. The 
mean length of the intervals presented in a series 
was 1, 3, or 5 seconds. Each series contained 4 
trials with each of three durations of Te, given in a 
random order determined with the aid of random- 
number tables; for series for which T, = 1.0 second 
these durations were .9, 1.0, and 1.1 second; for 
T. = 3.0 seconds they were 2.7, 3.0, and 3.3 sec- 
onds; and for Te = 5.0 seconds they were 4.5, 5.0, 
and 5.5 seconds. The variation in the intervals in a 
series was intended to prevent the formation of re- 
sponse habits. Each block had one series at 1.0 sec- 
ond, one at 3.0 seconds, and one at 5.0 seconds in 
that or the reverse order. The order was the same in 
all the blocks of a session, but opposite in a sub- 
ject’s two sessions. Half the subjects began with one 
order; half with the other. 

Six subjects were used. One had served in Experi- 
ment 5; the others were naive. Sessions lasted about 
40 minutes. 


Results 


The mean estimate was calculated for each 
Ts in each series of each session. PCE was 
calculated as (E—T,)/T,, and the increases 
in the estimates from the first to the fifth 
blocks of each session were calculated as 
100(£;—,)/E,, where Es and E, are the 
mean estimates of a given T, in Blocks 5 
and 1. The mean increases for the individual 
sessions (two for each subject) were: —18, 
=17; -13, —4; —32) 2; —7, 26. —50, 
—15; 19 and 0%. The corresponding values 
of PCE were: —25, —41; —36, —40; —30, 
—33; —44, —34; —31, —51; 30 and 56%. 
Mean values of E; PCE, expressed as a per- 
centage; and the percentage increases from 
Block 1 to Block 5, for the subjects’ first and 
second sessions, are shown in Table 9, 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 15 


TABLE 9 
Basic DATA FOR EXPERIMENT 7 


Session ees) 
9 10 11 2.7 3.0 3,3 4.7 5.0 5.5 
First 
E (seconds) 73 84 90 2.14 2.34 2.50 3.26 3.62 3.89 
PCE —19 —16 —18 —21 —22 —24 —28 —27 —29 
Increase —10 —12 —12 —14 —21 —17 —18 —21 —24 
Second 
E (seconds) .76 85 90 1.95 2.11 2.33 3.11 3.43 3.68 
PCE —15 —15 —18 —28 —30 —30 —31 —31 —33 
Increase —21 —21 —16 1 —4 —1 —12 —10 —8 
All 
E (seconds) 75 85 90 2.05 2.23 241 3.18 3.53 3.79 
PCE —17 —15 —18 —24 —26 —27 —29 —29 —31 
Increase —16 —16 —14 —6 —13 —9 —15 —15 —16 
Discussion a number of ways. (a) In the sessions in 


Five of the six subjects showed a mean de- 
crease in their estimates of constant time 
intervals during the course of both sessions; 
the sixth showed an increase in one session 
and no change in the other. This parallels 
the high proportion of sessions showing length- 
ening in earlier experiments, and confirms the 
prediction this experiment was designed to 
test. 

The mean Es for each T, are very similar 
in the first and second sessions; in each case 
they underestimate Te. Since with ME a de- 
crease in E replaces the lengthening found 
with other procedures, it might be thought 
that the usual pattern of constant errors would 
similarly be replaced by a tendency for under- 
estimation to be greater at shorter values of 
Te, but this is not shown by the mean results; 
instead greater underestimation appears at the 
longer intervals. Since lengthening was found 
to be proportionately greater at shorter inter- 
vals, we might also expect Æ to show propor- 
tionately greater decreases at the shorter dura- 
tions of Te, but though mean decreases are 
absolutely larger for T, = 1.0 second than for 
Te = 5.0 seconds in the subjects’ second ses- 
sions, the reverse is shown in the first sessions. 
Examination of individual results suggested 
that if the one subject who did not give a mean 
decrease in each session was excluded, the 
sessions showed two patterns which differed in 


Group H (five sessions given by four subjects, 
two being first sessions and three second ses- 
sions) the estimates for T, = 1.0 second in 
Block 1 were high (mean = 1.12 second, 
range = .79—1.62 second); in Group L (five 
sessions given by four subjects, three being 
first sessions and two second sessions) they 
were low (mean = .56 second, range = .34- 
.77 second). There was no similar difference 
for T, = 3.0 or 5.0 seconds in Block 1; in 
Group H the mean estimates for these inter- 
vals were 2.18 and 3.33 seconds; in Group L 
they were 2.13 and 3.69 seconds. (b) In each 
session of Group H PCE for T, = 1.0 second 
was larger algebraically than that for T= 5.0 
second (i.e., underestimation was greater at 
the latter interval) both in Block 1 and for 
the session as a whole. In Group L there was 
greater underestimation at 1.0 second than at 
5.0 seconds in Block 1 of each session and, 
in four cases, in the session as a whole. (c) In 
Group H the decrease from Block 1 to Block 5 
for T, = 1.0 second was greater than that for 
T, = 5.0 seconds in four cases, and equal in 
the fifth (mean increase for 1.0 second was 
—39%; for 5.0 seconds, —21%); in Group L 
the decrease at 5.0 seconds was greater in each 
session (mean increase for 1.0 second was 
1%; for 5.0 seconds, —22%). The changes 
in PCE in successive blocks for Groups H and 
L are plotted in Figure 5. (d) Four of the five 


16 MICHEL TREISMAN 


sessions in Group H began with a T, = 1.0 
second series, and one began with T, = 5.0 
seconds; the reverse was true of Group L. 
Though this difference is not significant it sug- 
gests that whether a session falls into Group H 
or Group L may depend mainly on this fea- 
ture of procedure. For each of the six sub- 
jects the mean value of E for the T, = 1.0 
second series in the first block was less in the 
session which began with a T, = 5.0 seconds 
series than in the session in which the T, = 1.0 
second series came first. This is significant 
(two-tailed binomial theorem, p = .03); it 
might be due to a “contrast” effect of the 
responses made to the intervals in the T, = 5.0 
and T, = 3.0 seconds series on those made 
when the T, = 1.0 second intervals are pre- 
sented, or it might be due to the lengthening 
effect, here manifesting as a decrease in E 
during the course of the first block. In either 
case it would reduce the possibility of demon- 
strating a fall in Æ at short intervals. 

The differences in Figure 5 between the 
curves for T, = 3.0 and 5.0 seconds in the 
two groups are relatively small, but the curves 
for T, = 1.0 second differ considerably. Com- 
parison of the two groups suggests that the 
proportionate decreases at shorter intervals 
will be greater provided that estimates of these 
intervals are not initially too small. If they 
are small, there may be very little decrease. 


Fic. 5. Mean PCEs of the estimates in each block 
of the sessions in Groups H and L in Experiment 7 
are plotted separately for Te = 1.0, 3.0, and 5.0 
seconds. 


This might be due to the fact that the re- 
sponses available are limited by the closeness 
of the extreme response of “0 seconds”; the 
proximity of this “anchor” may cause resist- 
ance to a reduction of estimates much below 
half a second. 

Greater underestimation at the shorter in- 
tervals is shown initially in Group L, but this 
effect becomes less marked during the course 
of the session since the estimates of the longer 
intervals fall but the change at shorter inter- 
vals is limited. 


GENERAL Discussion 


We can now summarize the main features 
of temporal discrimination that have been 
demonstrated or confirmed in these experi- 
ments. 


Weber Function 


It was found that AT = %(T+a) ade- 
quately described the relation between the 
difference thresholds and time intervals in 
Experiments 1, 2, and 5, in which MR, MP, 
MCS, and MSS were used. Deviations from 
linearity of regression reached the 5% signifi- 
cance level in Experiments 2 (MP) and 5 
(MSS), but the data did not Suggest an al- 
ternative function. 

The Weber fraction did not decrease to a 
minimum at a short interval in Experiments 
1 and 2. It was suggested that this frequently 
reported finding might sometimes be due to 
the development of rhythmic double responses 
with decreased variance. When, in Experi- 
ment 4, the procedure was altered to favor 
this type of response minimal Weber fractions 
appeared. The advantage of rhythmic re- 
sponding is probably not due to a decrease in 
the variance of the motor responses them- 
selves. In Experiment 4A, in which two re- 
sponses were required, the variance at long 
intervals, which presumably do not evoke 
rhythmic responding, was not larger than in 
Experiments 1 and 2 (see Table 5), and the 
decrease in the standard deviations in Experi- 
ment 5, as compared with Experiment 2, is 
small. More likely the interaction of the 
neuromuscular coordinating mechanisms with 
the properties of the effectors provides less 
variable temporal cues, at certain short inter- 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 17 


vals, than do the basic mechanisms. This is 
supported by an experiment by Davis (1962) 
who made subjects reproduce intervals of 2, 
4, or 8 seconds while counting at controlled 
rates. He found that the variability of T, 
was least fur rates cf 1-2 per second and 
increased at greater or lesser rates. 


When the subject is not required to delimit: 


an interval physically, other rhythmic move- 
ments, such as subvocal counting, slight head 
movements, etc., might be used at suitable 
intervals as an aid in making judgments, This, 
the variability of the data, and the possibly 
inconstant effects on variance of the sequen- 
tial dependencies demonstrated in Experi- 
ment 5 may account for the minima shown in 
some sessions in that experiment. 

Although it remains possible that failure 
to find a minimum, rather than its presence, 
is the artifact, it seems most plausible to sup- 
pose that the operation of the basic time- 
keeping mechanisms is best described by the 
Weber function, and that the occurrence of 
minima is due to extra factors, such as those 
which have been sugge: ted. 


Constant Errors 


Experiments 1, 2, 4, and 5A showed the 
classical greater overestimation, or lesser 
underestimation, of short intervals as com- 
pared with long, with an indifference interval 
except when all the intervals were overesti- 
mated or underestimated. This was not found 
in Experiment 5B, perhaps because a fixed 
T, range was used in the absence of a stand- 
ard stimulus. No evidence was found for 
Hollingworth’s explanation; the indifference 
interval did not settle at the median of the 
range of intervals during the course of a ses- 
sion, nor did it decrease when the range was 
shifted down in Experiment 2. 


Lengthening 


A tendency for the mean T, or Tp, in Ex- 
periments 1-4, or an estimate of PSE, in 
Experiment 6, to increase, and of E, in Ex- 
periment 7, to decrease during the course of a 
session was found. These effects were not 
found in every session, but shortening was less 
frequent and less marked than lengthening. 
It was absent in Experiment 5, perhaps be- 


cause of the use of a fixed T, range. When 
this constraint was relaxed, in Experiment 6, 
it reappeared. 

The lengthening was proportionately greater 
for the shorter intervals; this was significant 
in Experiments 1A and 2. This is not attrib- 
utable to superimposing the central tendency 
effect on a constant rate of increase. In Ex- 
periment 3 only one value of T, was used in 
each session, but greater proportional changes 
were still obtained with the smaller standard. 
It is not due to a slowing of the motor response 
terminating the intervals, which would add a 
constant increase to all productions and repro- 
ductions. The absolute increases between the 
first and final blocks appear to get larger as 
T, increases in Experiments 1A, 2 (MP), and 
4; the simple reaction time does not increase 
during the session in Experiment 3; lengthen- 
ing is shown in Experiment 6 although the 
motor response is excluded as a possible cause 
by the use of MCS and MSS; and a decrease 
in E, which may be attributed to the same 
underlying causes as lengthening, is shown 
with ME in Experiment 7. 

In three sessions in Experiments 3, 4, and 6 
shortening of the means exceeding 5% was 
seen. In one session in Experiment 7 a cor- 
responding increase in Æ was obtained. 


Intertrial and Interseries Effects 


In Experiment 5 a sequential effect, such as 
would be expected if assimilation were present, 
was demonstrated in half the sessions, and it 
was shown that this was due to the preceding 
variable stimulus not to the response made to 
it. In one session a contrast effect was appar- 
ently present. 

These effects were due to the influence on a 
judgment of an immediately preceding stimu- 
lus. The question arises whether any similar 
relation between successive series of produc- 
tions or reproductions was present in Experi- 
ments 2 and 4A. In these experiments series 
for which T, =.5 or 1.0 second sometimes 
followed immediately after series of larger 
(1.0 or 3.0 seconds) or smaller (.25 or .5 sec- 
ond) intervals, Mean PCEs were calculated 
for the .5- and 1.0-second series in each case, 
and since the effects found were similar the 
constant errors have been averaged and are 


18 MICHEL TREISMAN 


given in Table 10. The result for each experi- 
ment is similar: there is considerable assimi- 
lation of MP series to the preceding series, 
but there is little if any effect on MR series. 
In Experiment 2 MP series were always pre- 
ceded by MR series and vice versa; in Experi- 
ment 4 only one procedure was used in each 
session. Thus the similar result for the two 
experiments shows that the inclusion of stand- 
ard intervals, as such, in the preceding series 
has little effect on the assimilation produced, 
but that MP series are more susceptible to 
this than MR series. 

In Experiment 2 each session consisted of 
an ascending block followed by a descending 
block (an A-D session), or the reverse. In 
ascending blocks shorter intervals precede the 
longer so that assimilation should tend to 
reduce the mean PCE, and it should have the 
opposite effect in descending blocks. Table 10 
shows that in three cases out of four PCE is 
larger in the descending block than in the 
corresponding ascending block, the differences 
being greater with MP. 

If assimilation were transient this might 
explain the greater effect on MP series which, 
with only one stimulus on each trial, are 
shorter than MR series. Mean PCEs were 
calculated for each position in the series for 
all T, =.5 or 1.0 second series immediately 
preceded by series of smaller or larger inter- 
vals in Experiments 2 and 4A; the results, 


TABLE 10 


MEAN PERCENTAGE CONSTANT ERRORS 


Eapecinnent Preceded by: 
Larger7, Smaller T, 
2, Ta = .5 or 1.0 second 
MR 14 12 
MP 24 0 
4A, T, = .5 or 1.0 second 
MR 5 6 
MP 52 25 
2, All series First block Second block 
MR 
A-D sessions 6 12 
D-A sessions 10 14 
MP 
A-D sessions 9 18 
D-A sessions 18 11 


Serial position. 


Fic. 6. Mean PCEs are plotted against position 
within the series for all L series (i.e. those series 
immediately preceded by a series of longer intervals) 
and § series (those immediately preceded by shorter 
intervals) for which T, = .5 or 1.0 second in Experi- 
ments 2 and 4A. 


combined for the two experiments and for 
both intervals, are shown in Figure 6. This 
shows that assimilation is not transient but 
persists throughout the series; if anything, 
there is a slight suggestion of a transient ini- 
tial contrast effect. 


MODEL OF THE TIME KEEPING MECHANISM 


A model will now be presented which was 
devised in an attempt to explain and relate 
the psychophysical findings which have been 
described. It derives from suggestions which 
have been made before (Helson, 1959; Hoag- 
land, 1933) and attempts to put them together 
in not too arbitrary a fashion to provide a 
coherent mechanism capable of explaining as 
many findings as possible in related ways. 
The principle followed is to postulate the least 
number of components necessary to allow time 
judgments to be made, and the basic assump- 
tion is that the mechanism is to some extent 
error correcting so that an error or limitation 
at one point will be partly compensated for by 
adjustments at other points. If one component 
is then assumed to show a consistent departure 
from perfect accuracy in its mode of opera- 
tion, this, with the adjustments it provokes, 
will determine a pattern of performance which 
can be compared with that shown by human 
subjects. The components of the model will 


H 


mope 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK ‘ 19 


be described and their modes of operation will 
be presented as a set of postulates; conse- 
quences will then be derived which can be 
compared with the experimentally determined 
performance. 


Components (see Figure 7) 


A pacemaker produces a series of pulses 
which travel along a pathway. A counter re- 
cords the number of pulses arriving at a given 
point and transfers this measure to the store. 
Measures in the store can be retrieved by the 
comparator (decision mechanism) which com- 
pares retrieved measures with current counts 
made by the counter and selects responses for 
the response mechanisms to make. A verbal 
selective mechanism acting on the store assists 
in retrieval, and a specific arousal center acts 
on the pacemaker and can modify its rate. 


Operations 


Postulate 1. The pacemaker produces a 
regular sequence of pulses which travel along 
the pathway at a constant rate. The basic 
interpulse interval is £ seconds, with vari- 
ance ot. 

Hoagland (1933) compared his “chemical 
clock” to pacemaker neurons. These produce 
impulses at rates which may vary and transmit 
them along axons whose speeds of conduction 
are relatively fixed. Analogous assumptions are 
made here, but no hypotheses about the neural 


RESPONSE 


MECHANISMS 
VERBAL SELECTIVE MECHANISM 


Fic. 7. A model of the internal clock. (The com- 
ponents are described in the text.) 


identity of the components of the model are 
intended. 

Postulate 2. Facilitation of the pacemaker 
by the specific arousal center affects the rate 
at which pulses are produced. A given level 
of specific arousal will determine a mean inter- 
pulse interval, ¢’, given by t = qto where 
q > 1 when specific arousal is low, and q < 1 
when it is high. Specific arousal may be af- 
fected by features of procedure, or meaningful 
aspects of the experimental situation, or by 
other factors; under constant conditions, as 
during the presentation of a time interval, it 
would usually be constant or change only 
slowly (for the time intervals studied here). 
Thus there will be little variation in ¢’ during 
the course of any one trial, but its value will 
vary from trial to trial. 

Postulate 3. The counter may record pulses 
in two ways: (a) If the measure is to be 
transferred to the store, the count will begin 
and end at the same fixed point on the path- 
way. Thus if an interval T, = nto, delimited 
by initial and terminal stimulus events S, and 
S$, (such as the onset and ending of a con- 
tinuous tone), is presented, counting would 
begin at Point A at S, and finish at So. 
When g=1 the measure recorded should 
be n, (b) If the measure is to go to the 
comparator, the count will begin at Point B 
and end at Point A. Thus if the counter were 
required to record s pulses, and the arrival of 
Pulse p; at Point B immediately precedes S,, 
then the count would begin when pj, arrives 
at Point B, and it would end when pjs is 
recorded at Point A. If p=rt, is the constant 
time taken for a pulse to travel from A to B, 
and g=1, then the time taken to complete 
this count, Te, would be given by T,=sto—p= 
(s—r)to. 

Postulate 4. Measures are “read into” the 
store by the counter, and retrieved from it by 
the comparator. The store can be considered 
as a (functionally) unidimensional array of 
locations or addresses, one of which is the 
“zero point.” Each location corresponds to a 
measure, the correlate of the measure being 
the distance of that location from the zero 
point. (The notches on a measuring stick pro- 
vide a simple analogy for this.) Thus if the 
counter has recorded m pulses, this measure 
will be read into the location whose distance 


20 MICHEL TREISMAN 


from the zero point is log n. Locations can 
vary in their degree of activation; by activa- 
tion is meant a feature of the state of a loca- 
tion on which the probability that it will be 
selected for retrieval may depend, The effect 
of “reading a measure into the store” is to 
increase the activation of the corresponding 
location. Some activation of neighboring lo- 
cations will also be produced, giving a dis- 
tribution of activation with its mode, in this 
example, at log n. 

Activation from different sources will sum. 
It will also tend to fade so that the store is 
essentially short term. When the subject at- 
tends to Ts, rather than incidentally record- 
ing T, or Tẹ, more activation may be pro- 
duced, giving longer retention. 

The assumption that distance in the array 
is a logarithmic function of the counted meas- 
ures is not critical for the model; it could be 
replaced by any monotonic function and, with 
suitable adjustments, the same predictions 
could be derived. But it is convenient, and is 
in accord with arguments given elsewhere 
which favor the use of logarithmic “psycho- 
logical” scales (Treisman, 1962, 1963, 1964a). 

Postulate 5. The verbal selective mecha- 
nism provides a long-term memory store 
which can be operated by symbolic perceptual 
cues. It is a repository of “verbal labels” such 
as 1 second, 2 minutes, etc., each of which is 
linked to a particular location in the store. 
When a verbal label is identified by an in- 
struction to the subject it will produce a dis- 
tribution of activation in the store, centered 
at its point of attachment. This will sum with 
residual activation from other sources, such 
as presentations of Ts, the mode of the com- 
bined distribution determining the locus of 
retrieval. 

When there is a large disparity between the 
distributions produced by the verbal label 
and the corresponding Ts, perhaps when they 
do not sum to give a single mode, the attach- 
ment of the verbal label may be adjusted to 
reduce this disparity. 

Postulate 6. The comparator can retrieve 
measures from the store and can compare 
these retrieved measures with inputs from the 
counter to decide when to terminate T, or Tp 
or which response to select in MCS, MSS, or 
ME. (a) To retrieve a measure the zero 


point and the selected location, usually the 
maximally activated location, are identified, 
and an antilogarithmic transformation is ap- 
plied to the distance between them. In doing 
this the comparator may consistently misplace 
the zero so that the quantity that is trans- 
formed is given by log n+log m, where log m 
is a measure of the shift from the true zero 
point. (b) After the exponential transforma- 
tion a constant quantity, r, is added, so that 
the retrieved measure (RM) is given by, 


RM = exp(logn+logm)+r = mn+r [8] 


(c) In MP or MR when §,, the stimulus 
event initiating T, or Tp, is received the 
counter will operate as in Postulate 3b, and 
the count will be continuously compared with 
RM by the comparator. When ?j4¢mnyr) has 
arrived at Point A the interval will be termi- 
nated. The time taken to complete this count, 
Te, will be given by, 


Te = (mn+r)t'—p = mngtot+(q—-1)rt. [9] 


If m = q = 1, then T, = nt, = T,. Neglecting 
motor response time, T, or Tp will equal T,. 
(d) In MCS or MSS the counter will operate 
in the same way, and the comparator will 
determine whether S, the stimulus event ter- 
minating Te, or Pj,cmniry, is received first. 
In the former case it will select the response 
shorter; in the latter, longer. PSE, taken as 
the duration of T, which gives P(L) =.5, 
will equal T.. (e) In ME the counter will 
operate in the same way on receipt of S,; and 
a process of retrieval will be initiated which 
will not depend on identifying a maximally 
activated location, but will be continuous. The 
successive locations in the store will be re- 
trieved in turn so that at any interval, T,, after 
S:, the relation of T, to log n, the location 
under retrieval, will be given by Equation 9. 
When the terminal stimulus event, S., arrives 
the comparator will identify the location, log 
Mr, then under retrieval (this will be equivalent 
to solving Equation 9 for n when T,=T-) 
and the verbal label attached to this location 
in the store will determine the subject’s E. 
If T.=neto, n, will be given by, 


n, = (te+r—1q)/mq [10] 


The events which correspond to perceiving 
and retaining an interval and selecting a re- 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 21 


sponse have now been described. We could 
suppose that 7 represents noise which is in- 
cluded in the retrieved measure by the channel 
conveying it to its point of application, or it 
could be interpreted as an arbitrary, unavoid- 
able error in the retrieval process, or in some 
other way. In each case other features of the 
model can then be understood as attempts to 
compensate for the addition of this quantity. 
The first such feature is the mode of counting 
described in Postulate 3b. If RM were com- 
pared with a count made at one point on the 
pathway only, then T, would have added to it 
a constant error equal to rt’. The effect of 
starting the count at B and completing it at A 
is that when m =q = 1 this error is com- 
pletely compensated for, as can be seen from 
Equation 9. When g # 1 an error ré'—rt, re- 
mains, and there is as well the error due to 
the divergence between nt’ and nto; the addi- 
tion of log m in Postulate 6a can be under- 
stood as an attempt to compensate for these. 
As will be shown, the optimal value of log m 
for this purpose varies with Tẹ and q. For 
log m to correct error effectively it must, 
therefore, be capable of being adjusted as its 
effects on performance change. Two sources 
of corrective “feedback” can be suggested. 

Postulate 7. If the subject is selecting be- 
tween the responses L and S he may record 
his response frequencies and adjust log m to 
reduce deviations of P(L) from .5. 

Postulate 8. Activation of a location in the 
store will shift log m towards the optimal 
value for the corresponding interval; thus the 
value of log m at any time will be in part a 
function of the average effect of all locations 
then active. 

For the sake of clarity, and in order to 
show that a model of this sort could be con- 
stituted from simple elementary mechanisms, 
an attempt has been made to present it ina 
detailed and concrete form. But its main 
weight lies in the definition of parameters and 
their interrelations, arbitrary or “error cor- 
recting,” and their relations to features of the 
experimental situation, as expressed in the 
equations given, and the predictions resulting 
from them. Details such as the suggested 
identification of 7, the logarithmic form of 
the short-term store, or the description of q as 
an “arousal” effect, can be considered as minor 


subsidiary hypotheses; they could be consid- 
erably varied with little effect on many of the 
predictions which can be derived from the 
main assumptions of the model. Some predic- 
tions will now be presented and discussed. 


Some Consequences of the Model 


Weber Function, 


1. T;, Tp, or PSE in MR, MP, MCS, or MSS 
all correspond to T, in Equation 9, and AT 
will similarly correspond to ør, The dura- 
tion of T, on any one trial is given by, 

mnr 

Te=( zh =p) 

i= 
It follows from Postulate 2 that on any one 
trial the values of the interpulse interval, t, 
which are summed to give T,, will be very 
similar, though there will be variation between 
trials. Consequently the product-moment cor- 
relations between # and tj, where these are 
any two of the summed interpulse intervals, 
will be positive and large. If we suppose them 
to approximate to 1 then, since p is a constant, 


oT, = (mn+r) 2o 


Therefore we can write 


AT = o;i(mn+r) = Te (T,+rto) 
o 

=k(T+a) [11] 
If we ignore motor response variance in MR 
and MP and the physical variance of Tẹ in 
MCS and MSS, this equation predicts the 
experimental relation between AT’ and T. It 
is, of course, the Weber function, which ap- 
peared to describe this relation best in the 
earlier experiments. It provides interpreta- 
tions for the empirical constants k and a in 
terms of the model. 

2. Equation 11 applies whether or not T, 
is presented on each trial, When a standard is 
used it might seem that variation in the re- 
trieved measure should add to the response 
variance, since if the mean interpulse interval, 
t, varies from time to time, the counts made 
should vary accordingly: T,=nt,=n't’. But 
the variation in RM will not be as great 
as that in n', since the retrieval mechanism 
selects the locus of maximal activation, and 
this will be determined by summation of the 
residual activation produced by presentations 


22 MICHEL TREISMAN 


of T, over a period (and activation from the 
verbal label), If the activation is at all per- 
sistent, the variance of the mode of the 
summed distribution will be very much less 
than the variance of n’. In Experiments 1, 2, 
and 5 the Weber functions for MP and MR, 
or for MSS and MCS, did not differ signifi- 
cantly, in each case, from a common function 
fitted to the combined data. This suggests 
that any variance due to presentation of the 
standard is relatively small. Standard devia- 
tions tended to be a little larger with MP than 
with MR, perhaps because Tp’s were some- 
times greater than the corresponding T,’s. 

3. Equation 11 should apply equally to 
methods using production and comparison. If 
Equations 4 and 5 are compared with Equa- 
tions 6 and 7, similar values of a and k are 
found. The standard deviations in Experi- 
ment 5 are smaller than those in Experiment 2 
in six out of eight cases, perhaps due to an 
effect of motor response variance or the larger 
size of the T,’s and T,’s in the latter experi- 
ment, 


Constant Errors. 


4. The values of PCE predicted for the 
methods of production and comparison are 
given by, 

PCE = (T,—T,)/Ts 

= (mq—1)+(q—-1)rt,/nto [12] 
When m = q = 1 there will be no error at any 
interval. For g>1>mq PCE will be posi- 
tive when T, is small, and will decrease as T, 
increases, becoming asymptotic to mg—1, 
which is negative, when T, is large. This is 
the classical pattern which was confirmed in 
Experiments 1, 2, 4, and 5A. If mg >1>q 
the opposite phenomenon would be predicted, 
i.e., underestimation of short and overestima- 
tion of long intervals. Though this is not often 
found it has been described (Woodrow, 1934). 
Equation 12 can be written, 


PCE = d+e(1/T,) [13] 
where d and e are constants. This equation 
has been fitted to the data of Experiments 1A 
and 1B for each block separately, and the 
resulting curves are shown in Figure 1. The 


theoretical curves give a fair fit to the data 
for Blocks 2 and 3. This is especially satisfy- 


ing in view of the approximations involved in 
assuming that g and m are constant for each 
block, as required by Equation 13. The value 
of log m will depend at any time on an average 
effect of recently presented time intervals. 
Figure 1 suggests that once the full range of 
intervals has been presented log m varies rela- 
tively little subsequently. 

A similar prediction can be made for ME. 
If it is assumed that E = n,t, then PCE is 
given by, 


PCE = (E-T,)/T. 
=- (mq —1)/mq—(q-1)rto/mgT, [14] 


In this case there will be no error at any in- 
terval when m = q = 1, and when g > 1 > mq 
PCE will be negative for small values of T, 
and will increase as T, increases, becoming 
asymptotic to —(mq—1)/mg, which is posi- 
tive, for large values of Te. This pattern was 
shown in some but not all of the sessions in 
Experiment 7. The prediction is, however, 
very dependent on the extra assumption that 
E = nt, and may not hold for a different 
relation. Thus if E=n,t,+c, where c is a 
sufficiently large constant, the opposite pat- 
tern, an increase in PCE as T, decreases, 
would be produced. 


Lengthening. 


5. Lengthening or shortening will be shown 
in the methods of comparison and production 
when the value of q in Equation 9 increases 
or decreases, Lengthening was observed in 
Experiments 1, 2, 3, 4, and 6, shortening was 
found in a few sessions. In ME, if E is a 
monotonic function of %4, Equation 10 implies 
that an increase in q will cause a decrease 
in Æ. In Experiment 7 this was found in all 
sessions of five of the six subjects. In one ses- 
sion estimates increased. 

These phenomena are attributed to changes 
in the activity of the specific arousal center, 
increased facilitation of the pacemaker giving 
shortening, decreased facilitation resulting in 
an increase in t’ = gt). Why do these changes 
occur? The lengthening observed showed no 
very regular or predictable pattern; subjects 
varied considerably among themselves and from 
session to session. In Experiment 2 lengthen- 
ing did not recur in similar form in successive 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 23 


sessions, as might have been expected of fa- 
tigue, nor did it change in any systematic 
way, as might a practice effect. In the experi- 
mental situation the subject was exposed to a 
long period of repetitive and monotonous 
stimulation, and, as would be expected (Os- 
wald, 1962), reports of drowsiness were fre- 
quent; but they seemed just as common with 
subjects who did not show lengthening—or 
even when shortening occurred—as with those 
who did. In Experiment 2 lengthening was 
most marked in the first and last sessions; in 
Experiment 4, in the first session; and in Ex- 
periment 6, in the second session. In this last 
experiment the course followed by lengthen- 
ing, when it occurred, was very variable, the 
most common patterns being a steady gradual 
rise, little change for some time and a late 
rapid rise, or an initial drop with a later rise. 
Consideration of the subjects’ reports sug- 
gested the possibility of links with boredom 
and anxiety. Lengthening appeared to be more 
frequent when subjects complained of bore- 
dom and to be less often shown if interest in 
some detail of the procedure was reported. 
On the other hand, those few subjects who 
showed marked shortening (or an increase in 
E in Experiment 7) appeared to be more tense 
or uneasy in the experimental situation than 
the average, suggesting that a high anxiety 
level may tend to maintain specific arousal. 
It is perhaps of interest that Falk and Bindra 
(1954) obtained shortened productions when 
their experimental program included electric 
shocks. A general parallel is suggested with 
the elusive variable or variables often blamed 
for labile aspects of performance and some- 
times labeled “interest,” “alertness,” “vigi- 
lance,” “motivation,” or “arousal.” To avoid 
unwanted connotations the noncommittal term 
“specific temporal arousal” has been pre- 
ferred, taken as an effect acting on the pace- 
maker which may or may not be correlated 
with other variables, We could propose as a 
subsidiary hypothesis that “specific arousal” 
is similar in nature to “general arousal,” and 
may show similar features. Thus we might 
expect it to be affected by subjects’ expecta- 
tions or attitudes or by the variety or mo- 
notony of the stimulus input or task, tending 
to decrease with monotonous stimulation. We 
might, for example, then explain the increase 


in the mean PCE in successive sessions shown 
by each subject in Experiment 2 as due to a 
change in attitude; with repetition of the 
monotonous and increasingly familiar hour’s 
work this causes a fall in motivation or arousal, 
giving an increase in the mean value of q. 

6. In MR or MCS 7, is presented on each 
trial, and it would seem that this should com- 
pensate for changes in q, since as the inter- 
pulse interval lengthens the number of pulses 
counted during presentations of the standard 
interval should decrease. If ¢, increases to ¢’, 
n will change to ’, where T, = nt, =n't. 
This will not give complete compensation, 
however, since the retrieved measure will not 
change at the same rate. RM is determined 
by the maximally activated location in the 
store (Postulate 6), and this will lie at the 
mode of the distribution produced when. the 
activation resulting from reading log n’ into 
the store is added to the residual effects of 
previous presentations of Ts, especially if they 
are recent, and to the activation produced by 
the verbal label. The shift in this mode will 
be less than the shift in n’, so that shortening 
or lengthening should be partly but not com- 
pletely compensated for when MR or the 
modified MCS used in Experiment 6 are em- 
ployed. In Experiments 3, 4A, and 6 only 
one procedure was used in each session, and 
there are an equal number of sessions in which 
T, was or was not presented on evety trial. 
If the measures of shortening or lengthening, 
disregarding sign (and considering only the 
first four blocks in MP sessions in Experi- 
ment 4A), are compared, they tend to be 
greater when T is not presented, as predicted 
(p<.05, Kolmogorov-Smirnov two-tailed test; 
Siegel, 1956). 

Partial compensation for changes in q and 
the reduction of large disparities between ver- 
bal labels and standards (Postulate 5) would 
both predict smaller mean PCEs in the T, 
conditions. For the same three experiments 
this is confirmed: mean PCEs, disregarding 
sign, are greater with MP or MSS (p < .05; 
same test). 

The partial compensation for lengthening 
when T, is presented may partly explain the 
greater lengthening shown during the course 
of MP series in Experiments 2 and 4A, as 
plotted in Figure 6, Another factor might be 


24 MICHEL TREISMAN 


that MP series, in which each stimulus re- 
quires the same response, are more monoto- 
nous than MR series, which require the sub- 
ject to alternate between recording T, and 
producing T,. The more varied task might 
more effectively maintain specific arousal. 

7. An interesting problem arises in connec- 
tion with the effect on time judgments of the 
intensity of auditory stimulation. Here there 
are a number of apparently contradictory find- 
ings. Hirsh, Bilger, and Deatherage (1956) 
used MR with time intervals of 1-16 seconds 
presented in random order, the stimulus con- 
veying the interval being a continuous light or 
a 250-cycles-per-second, 80-decibel sound pres- 
sure level tone. They added 90-decibel sound 
Pressure level white noise to the background 
during either T, or T, and found that the 
mean reproductions obtained in the first case 
were shorter than in the second. Oléron 
(1952) used MR with intervals of .35-1.40 
second, T, being a continuous 1,000-cycles- 
per-second tone at 40, 70, or 90 decibels sensa- 
tion level, and T, always at 70 decibels sen- 
sation level, She found that an increase in the 
intensity of T, produced an increase in the 
mean length of T,. Goldstone, Boardman, and 
Lhamon (1959), using the method of limits 
without a standard, determined PSE for 1 sec- 
ond when the duration was conveyed by a con- 
tinuous light or by a 725-cycles-per-second, 
70-decibel sound pressure level tone; they 
found that PSE was considerably smaller when 
the tone was used. We have found, in Ex- 
periment 3, using MR or MP with T, and 
T, or Tp at 20, 50, or 80 decibels sensation 
level, that, for T,=.5 second, T, or Tp de- 
creased as the intensity of the tone increased. 

The last three results would be explained if 
intense tones increase specific arousal. If a 
loud tone is used the count made during T, 
will increase, and the duration of Te, for a 
constant n, will decrease. The retrieved meas- 
ure will also increase, but not as rapidly as 
the counts made, since retrieval will be partly 
determined by activation due to the verbal 
label and any preceding less intense presenta- 
tions of Ts. Thus the effect of the intensity of 
the tone during the production of T, would 
explain the decreases in T, and Tp found in 
Experiment 3, and the decrease in PSE found 
by Goldstone, Boardman, and Lhamon. In 


Oléron’s experiment the intensity of T, was 
constant, so that variation in T, would depend 
only on changes in the retrieved measure; we 
would consequently expect the increase in T, 
which was produced by an increase in the 
intensity of 7;. The failure to find an effect 
for T, = 3.0 seconds in Experiment 3 may 
indicate that the increased specific arousal is 
mainly induced during a short period following 
the onset of the stimulus. 

The results of Hirsh, Bilger, and Deather- 
age (1956) cannot, however, be explained in 
this way. In their experiment the presence of 
an intense auditory stimulus appears to have 
reduced specific arousal for intervals of 1-16 
seconds. The white noise they used was added 
to the background, rather than conveying the 
temporal information, as in the other experi- 
ments. Possibly in this case a suppressive 
mechanism evoked by and acting on back- 
ground interference also reduced specific 
arousal, 

8. If specific arousal is related to more gen- 
erally recognized forms of arousal we might 
expect drugs which increase arousal or pro- 
duce sedation to have an effect on time judg- 
ments; the former should reduce g and the 
latter should increase it. Goldstone, Board- 
man, and Lhamon (1958), starting from the 
hypothesis that time judgments depend on an 
“event counting process,” have made and con- 
firmed this prediction. They measured PSE 
for 1 second by the method of limits, no stand- 
ard being presented; subjects were then given 
dextro-amphetamine, quinal barbitone or a 
placebo, and PSE measured again. The first 
group showed a significant decrease, the sec- 
ond a significant increase, and the third a non- 
significant increase in PSE. This confirms that 
stimulants increase and sedatives decrease spe- 
cific arousal. It is of interest that this differ- 
ence was obtained although drowsiness was 
reported by subjects in all three groups and 
“none of the dextro-amphetamine group re- 
ported stimulation of mental or physiologic 
processes [p. 326].” 

9. In Experiments 1, 2, and 4 PCE was 
algebraically larger for smaller values of T: 
in Block 1, indicating that qı > 1, where q, 
is the mean value of g in Block 1, If 2 > qı 
the proportionate increase that would be pre- 
dicted for each interval in Block 2 is given by, 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 25 


_ Q2(mn+r)—r 
TEAS qi(mn+r)—r [1s] 


For qı, q2, m, and r constant, this equation 
implies that as T,= ntọ decreases T,/To, 
will increase, giving the greater proportionate 
lengthening at shorter intervals that has been 
found experimentally. 

In ME if q2 > qı, the corresponding changes 
to be expected in n, in Block 2 are given by, 


= qı(Tetrto—qrto) 
qa(Te+rto—qırto) 


This indicates that the estimates should be 
less in Block 2 and that the decreases should 
be proportionately greater for smaller values 
of T,. This was found in the Group H sessions 
in Experiment 7. Its absence in the Group L 
sessions, in which the estimates of T, = 1.0 
second were initially small, could be due to 
the response mechanisms, the closeness of the 
zero-second “response limit” producing resist- 
ance to a reduction of estimates below about 
half a second. 

10. The absolute increase in Block 2 is 
given by, 


Nr fnr, [16] 


To,—To, = (q2— q1) (mnto+rto) [17] 


This indicates that the absolute increase 
should get larger as Ts increases, an effect 
which was shown in Experiments 1A, 2 (MP), 
and 4, though not in 1B (two sessions) and 
2 (MR). 

In ME if E = n,t,+c, the absolute increase 
in B in Block 2 will be —(q2—q1)(To+@)/ 
m2. This indicates that the decrease should 
be larger in magnitude as Te increases. In the 
Group H sessions in Experiment 7 the in- 
creases for T,= 1.0, 3.0, and 5.0 seconds 
were —.54, —.48, and —.81 second, respec- 
tively. 

11. One implication of Equation 11 is that 
an increase in q will alter the Weber func- 
tion: & decreasing, Te increasing, and a re- 
maining constant (provided that o: does not 
increase at the same rate). Weber functions 
were computed for each block in Experiment 2 
and also, since q apparently increased in suc- 
cessive sessions, for each session. The values 
of k and a obtained are given in Table 11. 

The values of & clearly show the expected 
decreases, and there is also some suggestion 


TABLE 11 
A Test or CERTAIN ImpricaTions OF EQUATION 11 


Procedure k a 

MR 
Block 1 071 500 
Block 2 068 579 
Session 1 083 584 
Session 2 079 380 
Session 3 060 586 
Session 4 056 603 

MP 
Block 1 078 611 
Block 2 070 638 
Session 1 -102 383 
Session 2 077 531 
Session 3 069 585 
Session 4 052 1.213 


that a increases, an effect which is not pre- 
dicted. 

12, Equation 12 implies that when PCE is 
zero, i.e., at the indifference interval, 

m = (n+r—rq)/ng [18] 
It follows that, for PCE to be zero, if g is 
greater than 1, m must be less than 1, and 
vice versa. For a constant value of q > 1 m 
must increase as # increases. If for q > 1 
and T, = nto, m has the value given by Equa- 
tion 18 then there will be zero constant error 
for Tẹ, PCE will be positive for Ts < T, and 
it will be negative for Ty > Tẹ. Thus change 
in m will shift the indifference interval, inter- 
vals less than this will still be overestimated 
and longer ones underestimated. 

For zero constant error, if g > 1 and con- 
stant then m must increase as » increases, but 
if q < 1, m must decrease as increases. Thus 
if T, varies and m must be adjusted accord- 
ingly, then if q varies about 1, the appropriate 
direction of adjustment of m will vary cor- 
respondingly. This added complexity in the 
operation of the mechanism could be reduced 
by making q vary about a quantity somewhat 
greater or less than 1; as g seems to increase 
more often than it decreases a starting point 
above 1 would be preferable. These considera- 
tions may explain the experimental finding 
that overestimation of short and underestima- 
tion of long intervals are much more common 
than the opposite pattern. 

13. A change in q can be fully compensated 


26 MICHEL TREISMAN 


for by the appropriate change in log m, and 
P(L) could be used as a source of information 
which would allow this to be done (Postu- 
late 7). In Experiment 5, in which this in- 
formation was available, the subjects were 
almost completely successful in suppressing 
any effect of change in g. When P(L) was 
made a less redundant information source, in 
Experiment 6, lengthening reappeared. 

14. Loci of activation in the store will also 
affect the magnitude of log m (Postulate 8), 
and these two sources of feedback may con- 
flict. In a long MCS or MSS series at a given 
value of T, feedback from P(L) would tend 
to make PCE zero for that interval, and this 
would be opposed only by the effect of resid- 
ual activation at other points in the store 
which would tend to maintain the average 
value of log m for the set of intervals. Since 
standard intervals may be most effective in 
producing activation (Postulate 4) the feed- 
back from P(L) should be more strongly re- 
sisted in MCS than in MSS series. This pre- 
diction is supported by Experiment 5 in which 
constant errors almost completely disappear 
with MSS, but a tendency for underestima- 
tion to be less at shorter intervals remains 
with MCS. 

15. The “time-order error” corresponds to 
the constant errors which have been discussed 
here; the effect of the interstimulus interval 
on it has often been studied (Guilford, 1954) 
and has sometimes been explained in terms of 
“fading traces.” The factors determining 
PCE in the present model have been de- 
scribed; there appears to be little reason to 
expect systematic shifts in the mode of the 
activation distribution as activation decays. 
However, the length of the interstimulus in- 
terval might affect the level of specific arousal, 
a rapid sequence of stimuli being more arous- 
ing and a long wait between stimuli less so. 
In Figure 8 data are plotted which are rele- 
vant to this. They have been taken from an 
experiment by Björkman and Holmkvist 
(1960) in which the method of adjustment 
was used, the second of a pair of intervals 
being adjusted to appear equal to the first. 
There were three values of the interstimulus 
interval, .1, 1.5, and 4.0 seconds, only one 
being used throughout each experimental ses- 

sion, Values of PCE have been calculated for 


each interstimulus interval; Equation 13 pro- 
vides a very satisfactory fit to them. These 
results show that q is at first sufficiently small 
to give underestimation for the whole range 
of intervals, but that as the interstimulus 
interval increases it becomes larger—rapidly 
between .1 and 1.5 seconds and then more 
slowly. This confirms that features of pro- 
cedure can have short-term effects on specific 
arousal. 

16. In MCS P(L) can be determined for 
each value of T, from trials in which T, al- 
ways precedes T,, as in Experiments 5 and 6, 
or T, could always precede Ts. The resulting 
curves have sometimes been compared, and it 
has been found that they may intersect; the 
point of intersection is sometimes taken as the 
indifference interval (Stott, 1935; Woodrow, 
1935; Woodrow & Stott, 1936). Intersection 
occurs because the T,—T, curves are flatter 
than the T,—T, curves. This difference is 
explained by the present model. In each com- 
parison the first interval presented determines 
a value of T, which is compared with the 
physical duration of the second interval. For 
q > 1 and m constant, if T, is first the values 
of T, produced will be dispersed over a smaller 
range than the physical durations of the vari- 
able stimuli, and this will result in a smaller 
range of values of P(L) being obtained than 
for T,—T, trials. (As Tẹ, decreases in T,—T, 


-021 


A 


ata Far T SRE 
Ts (seconds). 


Li 


Fic. 8. PCEs are plotted against the standard 
time intervals; the theoretical curves are derived from 
Equation 13. (The method of adjustment was used; 
intervals of .1, 1.5, or 4.0 seconds intervened between 
the two stimuli. The data are taken from an experi- 
ment of Björkman and Holmkvist, 1960.) 


i ee ——— 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 27 


trials or, will also decrease, but not sufficiently 
to compensate for this effect.) 

The two curves will not necessarily intersect 
at the interval for which Ts = Te, so that 
identifying the point of intersection with the 
indifference interval is not supported by the 
present model. 


Assimilation. 


17. Various experiments have shown indif- 
ference intervals in very different ranges of 
time intervals. This is explained by Postu- 
late 8. The residual activation from a range 
of presented intervals will determine an av- 
erage value of log m. This will tend to pro- 
duce an indifference interval within the range 
of intervals, though this effect may be ob- 
scured by changes in g. Postulate 8 can 
also explain the sequential dependency effect 
which was found in half the sessions of Ex- 
periment 5. Subjects may not all, or always, 
record and store each presented Tv, Tp, or Tr; 
but if measures of T, are read into the store, 
then the activation produced by the most 
recent stimulus should have relatively most 
effect on log m. This should, therefore, vary 
slightly as a function of the immediately pre- 
ceding stimulus. When the latter is large 
log m will be slightly increased, and when it is 
short it will be slightly decreased so that in 
the first case T, will be longer than in the 
second, which would give the changes in P(L) 
which were found, The preceding presenta- 
tion of T, should have less effect in MCS than 
in MSS, since in the former a presentation of 
T, intervenes; in Experiment 5 the assimila- 
tion effect was detectable in one MCS and 
three MSS sessions. 

The same argument will account for the as- 
similation in Experiments 2 and 4. Table 10 
and Figure 6 show that much less effect was 
produced on MR than on MP series. The 
greater efficiency of presentations of T, in 
producing activation may explain why the 
former series were less effected by variation 
in the immediately preceding pattern of acti- 
vation. 

18. If assimilation is produced by changes 
in m, it should also be shown by AT, (see 
Equation 11). Mean standard deviations for 
all T, = .5 second and T, = 1.0 second series 


TABLE 12 
Mean STANDARD DEVIATIONS FROM EXPERIMENTS 
2 AND 4A 
Experiment (seconds) Preceded yy 
Larger T, Smaller T, 
2, MR 
1.0 126 120 
3 090 079 
2, MP 
10 179 178 
AN 095 070 
4A, MR 
1.0 132 088 
5 066 064 
4A, MP 
1.0 188 110 
5 125 085 


preceded by series of longer or shorter inter- 
vals in Experiments 2 and 4A are given in 
Table 12. All differences are in the expected 
direction though some are negligible. The ef- 
fect is more marked with MP than with MR. 


Discussion 


The model that has been presented is com- 
plex and to some extent arbitrary, although it 
necessarily embodies a number of simplifica- 
tions. Thus no allowance has been made for 
motor response time, or for any anticipatory 
mechanism compensating for it; nor for mo- 
tor response variance (though a comparison 
of the Weber functions for Experiments 2 and 
5 suggests that it does not contribute largely 
to the standard deviation); it has been as- 
sumed that the times of occurrence of S, and 
S, are determined without error; and variance 
due to the verbal selective mechanism or the 
decision process has been ignored, It is en- 
couraging that despite these approximations 
the consequences of the model agree with ob- 
servation as well as they do. 

Once the initial inaccuracy of retrieval and 
the instability of the pacemaker have been 
assumed, other major assumptions, such as 
Postulates 3b and 6a, can be understood as 
functionally related attempts to compensate 
for these limitations. The model also serves 
to relate different experimental observations. 
Thus the addition of 7 to the retrieved meas- 


28 MICHEL TREISMAN 


ure enters into the explanation of the constant 
a in the Weber function, the overestimation of 
short and underestimation of long intervals, 
and the proportionately greater lengthening 
at short intervals. The model gives rise to 
predictions which may be used to refute or 
confirm it, and suggests links with other areas 
of investigation. For example, the decrease in 
specific arousal during the course of a session 
invites comparison with the decline in vigi- 
lance often found in watch keeping tasks 
(Broadbent, 1958); should these phenomena 
prove to be related, temporal estimation tasks 
might prove useful as a method for assessing 
the state of vigilance of an observer, The 
Weber function, overestimation of small and 
underestimation of large stimuli, and assimi- 
lation effects have been observed on many 
sensory dimensions (Guilford, 1954); if the 
present model proves useful, it may suggest 
analogous explanations of similar phenomena 
elsewhere. Thus if we assume that the mag- 
nitude of an area or length is indicated cen- 
trally by the total discharge produced and 
that this is partly due to the receptors stimu- 
lated, whose number is directly proportional 
to the area or length, and partly to a constant 
spontaneous discharge, and that variation in 
a source of specific arousal makes the respon- 
siveness of all the elements in the sensory 
pathway vary in a similar way so that the 
summed discharges are highly correlated, then 
a function analogous to Equation 11 is readily 
obtained, 

There are plausible alternatives to a num- 
ber of the assumptions that have been made. 
Thus assimilation and central tendency effects 
have been explained by variation in log m, 
but some of these findings could have been 
explained by overlap and summation of the 
distributions of activation produced by differ- 
ent stimuli. These two explanations lead to 
different predictions in some cases so that they 
could be distinguished experimentally, but the 
present data are insufficient for this. 

As an alternative to a number of the pres- 
ent assumptions it could be postulated that 
o is negligible; r is added to m before the 
latter is read into the store; the stored meas- 
ure, log(m+r), varies spontaneously (this 
would correspond to the distribution of ac- 
tivation) with a constant variance, of,; and 


lengthening is produced by an increase in 
log m. The addition of r might be due to diffi- 
culty in determining the exact time of occur- 
rence of S, and Sz, perhaps because the per- 
ceptual input is segmented into “moments” 
(Stroud, 1955), and r would then represent 
the mean number of pulses in a moment. The 
variance of the responses would be derived 
from osm, and the Weber function and a num- 
ber of the predictions obtained above would 
be given by these assumptions. Some reasons 
for preferring the model given earlier are: 
(a) The values of a obtained experimentally 
are considerably larger than the perceptual 
moment, which is estimated to be 50-200 mil- 
liseconds long (Stroud, 1955). (b) The as- 
sumption that the Weber function slope, k, 
is determined by om, which is constant, leads 
to the prediction that & will remain constant 
when lengthening occurs. This was not found 
experimentally (see Table 11). (c) If r is 
added before the measure is read into the 
store, it should affect every use made of the 
stored measures; but if r is added by the 
comparator, then “central” uses of time, em- 
ploying the stored measures but by-passing 
the comparator, should not reveal the presence 
of r. Observations which may be relevant 
here were made in experiments on the effect 
of a warning stimulus on the threshold for a 
subsequent critical stimulus (Howarth & 
Treisman, 1958; Treisman, 1964b). When 
the warning stimulus preceded the critical 
stimulus by a fixed interval, T, the length of 
which was known to the subject, it produced 
a fall in threshold which was greater the 
shorter the warning interval. This could be 
explained by the hypothesis that the subject 
lowered his threshold criterion as his uncer- 
tainty about when exactly the critical stimulus 
would come was decreased. This uncertainty 
would arise because he could not tell exactly 
when the interval, T, ended, but could only 
delimit a “range of uncertainty” within which 
it was likely to end; this range would decrease 
as T decreased. An argument based on signal 
detection theory led to the conclusion that 
the fall in threshold would be approximately 
inversely proportional to the range of uncer- 
tainty. 

In this threshold task measures of time 
were used by the subject. We can interpret 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 29 


his performance in terms of the present model 
if we suppose that the extent of the range of 
uncertainty is determined by the spread of 
activation about the location which corre- 
sponds to T, the expected duration of the 
warning interval. Once the warning signal 
has been received the counter continuously 
reads into the store counts which measure the 
interval since the signal. Thus the locations 
in the store will be entered in succession, 
starting from the zero point. When a location 
is entered in this manner, in the absence of 
any stimulus event, we suppose that no ac- 
tivation is produced, but instead a measure 
of the pre-existing activation at that point due 
to the T distribution is taken and this meas- 
ure is used to weight the threshold criterion 
for simultaneously arriving stimuli. As osm 
is constant, the range of uncertainty will be 
proportional to (T+a) if r is added to n 
before the store, but to T alone if it is added 
after that point. Figure 9 shows the results 
of five experiments in which T was 1, 2, 3, 4 
or 9 seconds. In four experiments the abso- 
lute threshold for the visual phosphene, and 


‘Warning interval (seconds) 


Fic. 9. Data are shown for five experiments in 
which thresholds were measured at different intervals 
after a warning stimulus (Howarth & Treisman, 
1958). (Data points are plotted for four experi- 
ments in which the absolute threshold for the visual 
phosphene [measured in log microamperes] was 
determined. The mean threshold for each experiment 
has been equated, and continuous curves join the 
means for each warning interval. The dashed curves 
show the mean thresholds [in decibels] for the fifth 
experiment, in which auditory intensity difference 
thresholds were measured. In Part A the scale on 
the abscissa is proportional to 1/T, and in Part B 
it is proportional to 1/(T +a), where T is the warn- 
ing interval and a has the value given by Equa- 
tion 3.) 


in one experiment the auditory intensity dif- 
ference threshold, were measured. The mean 
thresholds for the first four experiments have 
been equated, and the means for the different 
warning intervals are joined by a continuous 
line. The auditory difference thresholds at 
the same intervals are joined by a dashed 
line. In Figure 9, Part A, the scale on the 
abscissa is proportional to 1/T and in Part 
B to 1/(T+a), a being given by Equation 3. 
(Experiment 1 was designed to allow this 
comparison.) The results come close to giv- 
ing a simple linear relation when 1/T is 
used, but not with 1/(T'+a). Thus when 
measures of time are used “centrally,” i.e., 
without involving the decision and response 
mechanisms, Weber’s law in its classic form 
appears to give the best fit. This conforms 
better with the model given earlier than with 
the version based on csm. 


SUMMARY 


Time estimation has been investigated in 
experiments using the methods of production, 
reproduction, constant stimuli, single stimuli, 
and estimation. The Weber function was 
found to give a good fit to the relation be- 
tween AT and T; evidence was obtained sug- 
gesting that the decrease in the Weber frac- 
tion sometimes found at a short interval is 
not an essential feature of the basic time 
keeping mechanism, but may be due to rhyth- 
mic modes of response or other factors. A tend- 
ency was found for the intervals produced or 
reproduced to lengthen during the course of 
a session, at a rate proportionately greater 
for shorter intervals, A corresponding tend- 
ency for estimates to decrease was found 
when the method of estimation was used. A 
model for the internal clock is proposed, based 
on a pacemaker, counter, store, and compara- 
tor with defined functional relations which 
tend to reduce error; and it is shown that the 
model provides explanations for the Weber 
function, the indifference interval, the com- 
monly found pattern of overestimation of 
short and underestimation of long intervals, 
the features of “lengthening,” assimilation, 
and other findings. 


30 MICHEL TREISMAN 


REFERENCES 


Byérxman, M., & Hotmxvist, O. The time-order 
error in the construction of a subjective time scale. 
Scand. J. Psychol., 1960, 1, 7-13. 

Bormo, E, G. Sensation and perception in the his- 
tory of experimental psychology. New York: 
Appleton-Century-Crofts, 1942. 

Broapgent, D. E. Perception and communication. 
London: Pergamon Press, 1958. 

Cravsen, J. An evaluation of experimental methods 
of time judgment. J. exp. Psychol., 1950, 40, 756- 
761. 

Davis, R. Time uncertainty and the estimation of 
time-intervals. Nature, 1962, 195, 311-312. 

EKMAN, G. Webers law and related functions. 
J. Psychol., 1959, 47, 343-352. 

Eson, M. E., & Karxa, J. S. Diagnostic implications 
of a study in time perception. J. gen. Psychol., 
1952, 46, 169-183. 

Farx, J. L., & Buvora, D. Judgment of time as a 
function of serial position and stress. J. exp. Psy- 
chol., 1954, 47, 279-282. 

Finney, D. J. Probit analysis. (2nd ed.) Cambridge: 
Cambridge Univer. Press, 1952. 

Fratsse, P. Les erreurs constantes dans la reproduc- 
tion de courts intervalles temporels. Arch. Psychol., 
Geneva, 1948, 32, 161-176. 

Gumanp, A. R, & Humpnreys, D. W. Age, sex, 
method, and interval as variables in time estima- 
tion. J. genet. Psychol., 1943, 63, 123-130. 

Gotpstong, S., Boarpan, W. K., & Luamon, W. T. 
Effect of quinal barbitone, dextro-amphetamine, 
and placebo on apparent time. Brit. J. Psychol., 
1958, 49, 324-328. 

Goxpstong, S., BoarpMan, W. K., & Luamon, W. T. 
Intersensory comparisons of temporal judgments. 
J. exp. Psychol., 1959, 57, 243-248. 

GoLDsTONE, S., Luamon, W. T., & BoarpMan, W. K. 
The time sense: Anchor effects and apparent dura- 
tion. J. Psychol., 1957, 44, 145-153. 

Grecory, R. L. An experimental treatment of vision 
as an information source and noisy channel. In 
E. C. Cherry (Ed.), Information theory: Third 
London symposium. London: Butterworth, 1956. 

Gumrorp, J. P. Psychometric methods. (2nd ed.) 
New York: McGraw-Hill, 1954. 

Hetson, H. Adaptation level theory. In S. Koch 
(Ed.), Psychology: A study of a science. Vol. 1. 
Sensory, perceptual, and physiological formulations. 
New York: McGraw-Hill, 1959. 

Henry, F. M. Discrimination of the duration of a 
sound. J. exp. Psychol., 1948, 38, 734-743. 

Hms, I. J., Boer, R. C., & DEATHERAGE, B. H. 
The effect of auditory and visual background on 
apparent duration. Amer. J. Psychol., 1956, 69, 
561-574. 

Hoactanp, H. The physiological control of judg- 
ments of duration: Evidence for a chemical clock. 
J. gen. Psychol., 1933, 9, 267-287. 


HorumcworTts, H. L. The central tendency of judg- 
ment. Arch. Psychol., N. Y., 1913, 4, 44-52. 

Howarts, C. I, & Treisman, M. The effect of 
warning interval on the electric phosphene and 
auditory thresholds. Quart. J. exp. Psychol., 1958, 
10, 130-141. 

Otéron, G. Influence de J’intensité d'un son sur 
Yestimation de sa durée apparente. Annee psychol., 
1952, 52, 383-392. 

Oswatp, I. Sleeping and waking. Amsterdam: Else- 
vier, 1962. 

Pum, B. R. The effect of interpolated and extra- 
polated stimuli on the time order error in the com- 
parison of temporal intervals. J. gen. Psychol., 
1947, 36, 173-187. 

Srecet, S. Nonparametric statistics for the behavioral 
sciences. New York: McGraw-Hill, 1956. 

Srorr, L. H. Time-order errors in the discrimination 
of short tonal durations. J. exp. Psychol., 1935, 
18, 741-766. 

Srroup, J. M. The fine structure of psychological 
time. In H. Quastler (Ed.), Information theory in 
psychology. Glencoe, Ill.: Free Press, 1955. 

TREISMAN, M. Psychological explanation: The “pri- 
vate data” hypothesis. Brit. J. Phil. Sci., 1962, 13, 
130-143. 

TreIsMAN, M. Laws of sensory magnitude. Nature, 
1963, 198, 914-915. 

TREISMAN, M. Sensory scaling and the psychophysi- 
cal law. Quart. J. exp. Psychol., 1964, in press. (a) 

TREISMAN, M. The effect of one stimulus on the 
threshold for another: An application of signal 
detection theory. Brit. J. statist. Psychol, 1964, in 
press. (b) 

TREISMAN, M., & Howarru, C. I. Changes in thres- 
hold level produced by a signal preceding or fol- 
lowing the threshold stimulus. Quart. J. exp. 
Psychol., 1959, 11, 129-142. 

Turcuior, R, M. The relation of adjacent inhibitory 
stimuli to the central tendency effect. J. gen. Psy- 
chol., 1948, 39, 3-14. 

Weser, A. O. Estimation of time. Psychol. Bull., 
1933, 30, 233-252. 

Wooprow, H. The reproduction of temporal inter- 
vals. J. exp. Psychol., 1930, 13, 473-499. 

Wooprow, H. The temporal indifference interval 
determined by the method of mean error. J. exp. 
Psychol., 1934, 17, 167-188. 

Wooprow, H. The effect of practice upon time-order 
errors in the comparison of temporal intervals. 
Psychol. Rev., 1935, 42, 127-152. 

Wooprow, H. Time perception. In S. S. Stevens 
(Ed.), Handbook of experimental psychology. New 
York: Wiley, 1951. 

Wooprow, H., & Storr, L. H. The effect of prac- 
tice on positive time-order errors. J. exp. Psychol., 
1936, 19, 694-705. 


(Received February 28, 1963) 


a: 
A-D session: 


d: 
D-A session: 


AT: 


AT/T: 


b: 
Pa, o(L), ete: 


PCE: 
Po(L): 


P(L): 
P,(L): 


P(S): 


TEMPORAL DISCRIMINATION AND THE INTERNAL CLOCK 31 


APPENDIX 


GLOSSARY OF SYMBOLS 


additive constant in the Weber func- 
tion 

session in which an ascending block is 
followed by a descending block 

d = (mg—1) 

session in which a descending block is 
followed by an ascending block 

just noticeable difference in the dura- 
tion of a time interval, T 

Weber fraction 

verbal or written estimate of Te 

e = (q—1)rto 

slope constant in the Weber function 
response: T» is longer than T, 
value added to log » when it is re- 
trieved 

location in the store activated when 
the count, , is stored; stored measure 
stored measure whose retrieval gives 
the value of RM for which Te = Te 
method of constant stimuli 

method of estimation 

method of production 

method of reproduction 

method of single stimuli 

number of basic interpulse intervals 
corresponding to a standard interval; 
T, = nto 

n't! = nto 

number of basic interpulse intervals 
corresponding to Te; Te = neto 

a pulse produced by the pacemaker 
Po(L), given that the preceding T» has 
the value A, etc. 

proportionate constant error: 
(Tr—Ts)/Ts, (E—Te)/Te, etc. 
probability of the response L, given 
that Te = C 

probability of the response L 

Po(L), given that the preceding Te 
has evoked the response L 

the probability of the response S 


> 


"E 


3 


SA 


SIRTF ae 


Te: 


point of subjective equality 

Po(L), given that the preceding T» has 
evoked the response S 

measure of the effect of specific arousal 
on t: t' = qto 

mean value of q in Block 1 

mean value of q in Block 2 

additive constant included in the re- 
trieved measure 

the constant time taken for a pulse to 
travel from Point A to Point B on the 
pathway; p = fto 

retrieved measure 

response; T» is shorter than T: 
stimulus event initiating a time inter- 
val 

stimulus event terminating a time in- 
terval 

standard deviation 

variance of a stored measure 

variance of t 

variance of Te 

interpulse interval 

basic interpulse interval 

mean interpulse interval; t’ = qto 
time interval 

interval in which the counter com- 
pletes the count corresponding to a 
given RM 

time interval presented for verbal or 
written estimation 

mean value of Te for a series of trials 
production of a standard interval 
reproduction of a standard interval 
standard interval, presented as a physi- 
cal duration or named 

one of a range of intervals presented 
for comparison with Ts in MCS or 
MSS; each range contained five inter- 
vals, referred to in order of increasing 
duration as A, B, C, D, and E. Except 
in Experiment 6, C = Ts. 


Vol. 77, No. 14 Whole No. 577, 1963 


Psychological Monographs: General and Applied 


MMPI DECISION RULES FOR THE IDENTIFICATION 
OF COLLEGE MALADJUSTMENT: 
A DIGITAL COMPUTER APPROACH? 


BENJAMIN KLEINMUNTZ 
Carnegie Institute of Technology 


In this study a set of decision fules was devised for interpreting profile patterns 
of MMPI of maladjusted and adjusted college students. The procedure used 
was that of computer programing of the “maladjusted” versus “adjusted” 
decisions of an expert test interpreter. The interpreter’s decision making proc- 
esses were tape recorded while he was thinking aloud during the sorting of 
the profiles of 126 college students. The programed decision rules, which were 
based on the interpreter’s protocol and which were improved upon by a proc- 
ess of trial-and-error statistical checking, yielded a greater hit % than the 
decisions of the original interpreter. In its final form the set of objective 
configural inventory rules identified correctly large numbers of maladjusted 


PROBLEM 


NE of the personality inventories fre- 
quently used in psychiatric settings to 

aid in making diagnostic and prognostic de- 
cisions is the Minnesota Multiphasic Personal- 
ity Inventory (MMPI). Since the construc- 
tion of the MMPI by S. R. Hathaway and 
J. C. McKinley (1940) and the subsequent 
completion of the full complement of validity 
and clinical scales by McKinley, Hathaway, 
and Meehl (1948), the emphasis has gradu- 
ally shifted from test interpretation based on 
single-scale analysis to interpretation using the 
configural properties of the test profile 
(Meehl, 1950). The most significant contribu- 
tion toward the formalization of a set of objec- 
tive configural rules for MMPI interpretation 
has recently been made by Meehl and Dahl- 
strom (1960). They devised a set of sequential 
decision making rules which successfully classi- 


1This study was supported in part by Grant M- 
5701 from the National Institute of Mental Health, 
United States Public Health Service. A summary 
version of this report appeared in Science (1963, 139, 
416-418). 

Grateful acknowledgement is hereby expressed to 
R. D. Shipp for valuable computational and com- 
puter programing assistance and to the following 
persons, who served as data collectors and Q sorters: 
A. B. Caldwell, R. Callis, M. D: Croner, A. Ivey, 
L. D. Goodstein, V. H. Jensen, M. Korman, P. E. 
Meehl, C. A. Parker, A. Rosen, J. D. Schein, W. Scho- 
field, and F. L. Vance. 


college students in 5 cross-validation samples. 


fied between 61 and 93% of the MMPI pro- 
files of psychotic and neurotic psychiatric 
patients in eight cross-validation samples. A 
recent survey by the author of all research 
done with the MMPI in college settings 
(Kleinmuntz, 1962) indicates that attempts 
to use the MMPI for diagnostic decision mak- 
ing among college students have not met with 
equal success. In almost all studies in which 
diagnostic, prognostic, or screening predictions 
were tried the researchers used at the most 
two scales, but more often based their predic- 
tion on single-scale scores, Several studies 
completed by the author (Kleinmuntz, 1960a, 
1960b) and others (Gough & Pemberton, 
1952; Schofield, 1953) demonstrated that a 
profile configural approach to MMPI inter- 
pretation may be possible among college popu- 
lations, Moreover, the author’s work (Klein- 
muntz & Alexander, 1962) in programing the 
Meehl-Dahlstrom rules for the computer sug- 
gested that a digital computer approach to 
MMPI interpretation may be feasible. 


SPECIFIC PURPOSES OF THE PRESENT STUDY 


This study was designed with the following 
three specific purposes in mind: 

1. To develop a set of objective MMPI pro- 
file decision rules which would discriminate 
between maladjusted and adjusted college 
students. Such a set of rules should be an aid 


2 BENJAMIN KLEINMUNTZ 


in the identification of, and possibly in the 
prediction of, emotional maladjustment among 
these students, 

2. To demonstrate the feasibility of apply- 
ing a set of computer programed decision rules 
to mass data for the identification of emo- 
tional maladjustment in the general college 
population. 

3. To apply the electronic digital computer 
to clinical decision making which is an en- 
tirely new area of intelligent problem solving 
for the computer. 


The first two aims of this study are closely 
related to the broader goals of mental health 
work in colleges; namely, the early identifica- 
tion and prediction of emotional difficulties 
among college students and the subsequent de- 
lineation of the most appropriate counseling 
techniques in the treatment and guidance of 
these students. The third aim of this study 
is related to recent developments in research 
on human thinking. This development has 
been facilitated by the introduction of the 
electronic digital computer as a research tool 
in the behavioral sciences and by the demon- 
SALGUN that this tool is much more than just 
a machine which performs rapid arithmetical 
operations, Among those who have contrib- 
uted most to this development have been Allen 
Newell and Herbert A. Simon of the Carnegie 
Institute of Technology and J. C. Shaw of the 
RAND Corporation (Newell, Shaw, & Simon, 
1958). What they have done has been to re- 
store thinking to the center of the psychologi- 
cal stage and they have provided considerable 
and impressive evidence that the digital com- 
puter when appropriately programed can carry 
out complex patterns of processes that parallel 
closely the processes observable in humans who 
are thinking (Newell, Shaw, & Simon, 1957; 
Newell & Simon, 1961). 


METHOD 


Overview of the Design of the Study 


The design of the study included the following 
procedures: 


1. The criterion maladjusted and adjusted students 
were chosen from three general college populations: 
the counseling group, the no-counseling group, and 
the random-normal group. The first group (N = 65) 
included self-referrals to the Carnegie Institute of 


Technology, Bureau of Measurement and Guidance. 
For a portion of this group (WN =37) the MMPIs 
which had been administered at the time they en- 
tered college were used for the study, and for the 
remainder of this group (NW = 28) the MMPIs which 
had been administered at the time they sought coun- 
seling were used. In both groups, ratings by the 
consulted counselor and by a counselor who examined 
the case folder (not including any test results) deter- 
mined each student’s maladjusted versus adjusted 
status. The no-counseling group (N = 31) consisted 
of fraternity and sorority members who were nomi- 
nated by their peers as being either “least” (N = 17) 
or “most”? (N =14) adjusted. MMPIs were ad- 
ministered to this group under supervised research 
conditions. The random normals (N = 30) were 
chosen from a group of 825 entering freshmen who” 
had MMPIs administered to them at orientation 
testing time. 

2. The MMPI profiles of these groups (N = 126 
were prepared for Ọ sorting and 10 experienced 
MMPI users were compared for their ability to pre- 
dict the maladjusted versus adjusted criterion. The 
“best” Q sorter (e.g., the one who achieved the high- 
est hit percent) was then selected and was instructed 
to “think aloud” into a tape recorder while perform- 
ing his @ sort task with several portions of the 
MMPI data. His tape recorded protocol was pro- 
gramed for the electronic digital computer and became 
the basis upon which more elaborate decision rules 
were built. 


$ ri 
Subjects and Test Administration 


In the development of the objective rules for the 
identification of maladjustment the MMPI profiles 
of 126 students were used. The subjects were divided 
into three major groups, and the test administration 
conditions were essentially different for each of the 
groups. 

Counseling Group. There was a total of 65 stu- 
dents in this group, all of whom received counseling 
at the Bureau of Measurement and Guidance. This 
group can be divided as follows: (a) Orientation 
group—there were 37 students in this group (19 
maladjusted and 18 adjusted), and these were stu- 
dents who had been administered the MMPI during 
orientation week prior to their entry to the Carnegie 
Institute. These students had contacted the Bureau 
of Measurement and Guidance within a period rang- 
ing from 1 to 3 months of their orientation testing. 
(b) New client group—28 students comprised this 
sample (9 maladjusted and 19 adjusted) and they 
were administered the MMPI by the Bureau’s re- 
ceptionist at the time that they sought counseling. 
Adjustment and maladjustment were defined in terms 
of the nature of the student’s problems. If a student’s 
presenting problem was primarily in the educational 
and/or vocational area, and if the consulted counselor 
felt that these areas were the primary focus during 
subsequent interviews, then the student was classified 
“adjusted.” On the other hand, if the student re- 
quested counseling for personal and/or emotional 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 3 


problems, and if these were judged to be of primary 
importance, then the student was rated “maladjusted.” 
In either instance a second counselor examined the 
case folders of the prospective criterion group stu- 
dents and rated them either “educational/vocational” 
or “emotional.” Only those students were accepted 
into their respective groups for whom there was inter- 
judge agreement. It is possible, of course, with such 
a criterion of maladjustment that a number of MMPI 
profiles of well-adjusted students might be included 
in the maladjusted group, but if there is an error here, 
it would be in the direction of making it more diffi- 
cult for the decision rules to hold up on cross vali- 
dation, The converse is true also; that is, it is possible 
that MMPI profiles of maladjusted students might 
be included in the adjusted group; and again this 
should increase the cross-validation shrinkage. 

No-Counseling Group. In order to counterbalance 
he effects of personality test set, and in an effort to 

cate potentially maladjusted students who, for one 
reason or another, may never contact a counseling 
center, a no-counseling group was used. 

There were 31 students in this group (17 malad- 
justed and 14 adjusted), and these were obtained from 
fraternities and sororities on campus. Each fraternity 
and each sorority at the Carnegie Institute partici- 
pated in this part of the study. In order to obtain 
judgments about the maladjustment or adjustment 
status of a fraternity or sorority member, a peer 
nomination technique was used. At one of the weekly 
meetings of fraternities and sororities, members who 
attended the meeting were instructed as follows: 


A campus-wide study is being biauda" to 
assess the ability of students to accurately rate their 
fellow students. i 

A list of the names of your fraternity brothers 
(or sorority sisters) is before you and you are to 
nominate the three “most adjusted” and the three 
“Jeast adjusted” members from this list. 

In order to help you in your nominations the 
following two definitions should be kept in mind: 

Most adjusted: A person who gets along well 
with others, and is cheerful, optimistic, sociable, 
mature, peaceable, adaptable, self-confident, re- 
Taxed, self-controlled and trustful. 

Least adjusted: A person who does not get along 
well with others, and who may be depressed, 
gloomy, pessimistic, seclusive, pugnacious, inflexible, 
self-distrusting, high-strung, emotional, un-self-con- 
trolled, or suspicious. 


After a frequency count of the nominations for 
most and least adjusted was made it was decided to 
retain students as prospective subjects for this study 
if their nomination for either the most or the least 
adjusted categories exceeded 60%. In this way a total 
of 55 students was chosen and these were contacted 
by telephone to appear at an agreed upon time to 
participate as subjects in test administration. In order 
to insure against subject attrition, each student was 
promised and paid at an hourly rate of $1.25. As 
can be noted from the size of the actual group (31 
students), there was some subject attrition. 

Random-Normal Group. The MMPI profiles of 


30 students (15 males and 15 females) were selected 
by entering a table of random numbers from the files 
of available entering freshmen’s tests. A routine 
search was conducted for each of the profiles ran- 
domly chosen in order to insure that these students 
had not contacted the counseling center for any help 
or that their profiles appeared in any of the above 
groups. This group was included in the criterion 
adjusted sample in order to aid in approximating 
the Carnegie Institute’s base rates of the number of 
students who contact the Bureau of Measurement and 
Guidance during their 4-year stay. About 40% of the 
student body at one time or another seeks some form 
of counseling at the Bureau. In our sample there is 
about a 50% split between counseled and noncoun- 
seled students (W =65 versus N = 61, respectively). 
A base rate not easily obtainable is the proportion of 
“maladjusted” students who enter Carnegie each year, 
The 36% maladjusted (V=45) obtained in the total 
criterion sample (W= 126) is probably an over- 
estimate. 


P 


MMPI Q Sorts 


The Q sort technique developed by Stephenson 
(1953) was selected as the method of choice for the 
adjusted versus maladjusted classification of MMPI 
profiles. Conventional use of Stephenson’s method 
calls for the preparation of a set of phrases covering, 
for example, certain personality traits that may 
characterize the individual(s) on whom one may be 
interested in obtaining a description. These phrases 
are then prepared on a set of separate cards and the 
rater is told to sort the cards by placing a specified 
number of cards in each of several piles. The num- 
ber of cards and the number of piles, of course, vary 
from one study to the next; but an in ant of the 
method is that the number of cards must be placed 
along the continuum of a forced normal distribution. 
In this study, instead of using cards on which were 
printed phrases or statements, the MMPI profiles 
were placed on the cards. The Q sort technique was 
chosen in this study because the data obtained this 
way lend themselves to analyses which were deemed 
most essential for the accomplishment of our goals. 
For example, in order to facilitate computer pro- 
graming of the MMPI interpreter’s decisions, unam- 
biguous statements of the way he utilized profile in- 
formation were needed; and the Q sort method, 
because it allowed the test interpreter to shift cards 
back and forth and because it is a multistep rating 
scale, was ideal for yielding such information. The 
Q sort technique was a convenient way of rating 
MMPI profiles because the classification task was 
not an either-or affair, but rather involved “least” to 
“most” adjusted classificatory decisions. In other 
words, there were gradations of adjusted and mal- 
adjusted MMPIs rather than a clear-cut adjustment- 
maladjustment dichotomy. Moreover, Stephenson’s 
Q sort technique was valuable in that it lent 
itself to the types of statistical computations that 
were needed. For example, it was essential to com- 
pute sort/re-sort reliability coefficients and validity 


4 BENJAMIN KLEINMUNTZ 


coefficients, and these are easily manageable with the 
chosen technique. Specifically, the Q sort procedure 
used was as follows: Each of the 126 MMPI pro- 
files was prepared for Q sorting by placing the pro- 
files on 434 X 4%4 inch cards. On these profile sheets 
the T scores for each of 16 scales were presented. 
The following scales were used: the four validity 
scales, ?, L, F, and K; and the clinical scales, Hs, 
D, Hy, Pd, Mf, Pa, Pt, Sc, Ma, and Si. In addi- 
tion to these, Barron’s (1953) Ego Strength (Es) 
scale and Kleinmuntz’ (1960b) College Maladjustment 
(Mt) scale were used. As an illustration of the 
nature of the Q sort sample with which the MMPI 
interpreters were confronted, the mean T scores (K 
corrected) of the adjusted and maladjusted students 
are presented in Table 1. The Zs and Mt scales were 
included and these are reported in raw scores. 

The profile cards were sorted by 10 experienced 
MMPI users who were instructed to place the 126 
individual profiles on a 14-step forced-normal dis- 
tribution from “least” to “most” adjusted. The 
sorters were instructed to start with two piles and 
then to fan out from the middle until they completed 
14 piles. Such a distribution, after sorting, is illus- 
trated in Table 2. In analyzing the hit and miss 
percents of the experts’ Q sorts, a cutoff line between 
maladjustment and adjustment was arbitrarily drawn 
at the middle of the distribution. Accordingly, Piles 
1-7 (N=63) were considered maladjusted and 
Piles 8-14 (N=63) were considered adjusted in the 
analysis of the Q sorts. 


RESULTS 
Analysis of Q Sort Data 


On the basis of having obtained the highest 
hit percent on the combined sample of 126 
profiles, one of the MMPI experts was selected 
for more intensive study. This expert’s hit 
percent, expressed in terms of valid and false 
positive and negative categories, appears in 
Table 3. His sort/re-sort reliability coefficient 
(Pearson product-moment) and the point- 
biserial correlations between his sorts and the 
maladjustment-adjustment criterion are also 
presented in Table 3. 

From the results reported in Table 3 it can 
be seen that in the expert’s first performance 
he was able to predict maladjustment and ad- 


TABLE 1 


Mean T Score Vatues ror MMPI SCALES OF THE 
CRITERION GROUP 


Male Female 


Mal- Mal- 
adjusted Adjusted adjusted 


MMPI Adjusted 


scale (N = 48) (N = 24) (N = 33) (N =21) 
? 50.0 51.0 50.0 50.9 
L 48.1 47.6 48.1 46.0 
F 51.8 58.5 52.5 60.9 
K 55.9 54.4 57.5 49.4 
Hs 50.7 55.3 52.8 54.7 
D 53.3 65.2 51.6 64.0 
Hy 56.5 61.1 58.0 61.9 
Pd 55.0 63.1 57.4 65.8 
Mf 56.0 65.0 45.0 45.2 
Pa 53.8 59.4 55.0 67.3 
Pt 53.4 64.2 56.0 64.0 
Sc 54.1 65.5 56.4 66.4 
Ma 57.0 60.0 58.2 63.4 
Si 47.9 54.4 48.5 55.1 
Es* 52.0 40.0 46.8 42.0 
Mt 10.7 18.1 11.7 23.5 
Note.—N = 126. 

* Raw score. 


justment correctly in 80 and 67% of the 
MMPI profiles, respectively; and he mis- 
classified 33% of the adjusted group by call- 
ing them maladjusted (false positives) and 
incorrectly classified 20% of the maladjusted 
profiles into the adjusted category (false nega- 
tives). His second sort closely approximated 
his first performance, and this is reflected by 
similar hit and miss percents and is seen most 
clearly from the substantial sort/re-sort re- 
liability coefficient of .96. The latter correla- 
tion strongly suggests that the sorter’s overall 
performance on the two occasions was highly 
consistent. The point-biserial correlations of 
-53 and .49 between the two sorts and the 
maladjusted-adjusted criterion may be con- 
sidered to be substantial validity coefficients 
and are additional evidence of the sorter’s 
accurate performance. 


TABLE 2 
FouRTEEN-STEP Forcep-NorMat DISTRIBUTION (Q Sort) or 126 MMPI PROFILES 


Least adjusted 


Most adjusted 


Pile 1 2 3 4 5 
Number of profiles 2 3 4 Ce Y 


7 8 9 10 11 12 13 14 
18 18 15 12 9 4 3 2 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 5 


TABLE 3 


Percentace or Hits AND Misses AND VALIDITY AND 
RELIABILITY Corrricrents OF MMPI Q Sorter 


TABLE 4 


Mean T Score VaLues ON MMPI SCALES FOR THE 
New Crrent GROUP 


Hit % Miss % Male Female 
Valid Valid False False Mal- Mal- 
Q sort positive negative positive negative MMPI Adjusted adjusted Adjusted adjusted 
- — cal =14) (N=5) (N= = 
First 80 67 33 20 EE E aM A 
Second 76 64 36 24 ? 50.0 50.0 50.0 50.0 
: > L 46.7 53.4 46.6 52.2 
Correlation First Second Sort/re- F 54.6 60.6 52.4 57.7 
coefficient sort sort sort K 52.6 52.4 52.2 48.2 
Correlations with Hs 476 51.6 478 63.0 
criterion (point D 62.1 65.8 58.0 72.0 
piserial) 53 49 Hy 56.1 58.2 56.0 66.2 
eliability (Pearson a aa e vii oe 
= product ment 5 y y i 
p _product_ moment) 2 Pa 523 63.8 524 67.2 
Note.—Total sample of 126 profiles. Pt 57.7 65.4 52.2 62.5 
Sc 57.7 65.2 51.4 61.0 
Ma 59.9 56.8 52.6 63.5 
The expert was then asked to Q sort the Si 52.9 60.0 50.6 53.7 
MMPI profiles of several subdivisions of the E 506 492 47.6 als 
criterion group, and he was instructed to Š %4 Bas 59 A 
“think aloud” into a tape recorder as he pro- Note 28. 


e 


ceeded. Since it was essential to obtain clear 
statements of the way the expert used profile 
information during Q sorting, the large cri- 
terion group was broken up into separate sub- 
groups and the sorter was asked to re-sort the 
data at least once. The following subgroup- 
ings were used. 

Maladjusted and Adjusted Counseling 
Clients: Group 1 (New Client Profiles). 
There were 28 clients in this group (9 malad- 
justed and 19 adjusted), and these were the 
students who were administered the MMPI 
at the time they sought counseling. The 
MMPI expert sorted their profiles along a 
6-step forced-normal distribution on 2 suc- 
cessive days. The piles, from least to most 
adjusted, respectively, consisted of 2, 5, 7, 
7, 5, and 2 profiles; and the cutoff line was 
drawn down the middle of the distribution. 
The mean scores on the MMPI scales are pre- 
sented in Table 4; and his record of hit per- 
cents, sort/re-sort reliability, and the point- 
piserial correlation between his first and 
second sorts appear in Table 5 

A comparison between the data reported in 
-Tables 5 and 3, respectively, indicates that 
the expert achieved hit and miss percents on 
the subsample of students (W =28) that were 
similar to his performance on the total sample 


(N=126). However, that he was not as con- 
sistent in his sort/re-sort behavior on the 
smaller samples as he was on the total group 
is borne out by the lower correlation coefficient 
obtained in the former group (.88). The cor- 
relations of .38 and .43 between the Q sorts 
and’ the criterion suggest that the sorter 
achieved less predictive accuracy in the smaller 
sample, 


TABLE 5 


PERCENTAGE oF Hits AND Misses AND VALIDITY AND 
RELIABILITY COEFFICIENTS OF MMPI Q Sorter 


Hit % Miss % 
Valid Valid False False 
Qsort positive negative positive negative 
First 78 63 37 22 
Second 78 63 37 22 
Correlation First Second Sort/re- 
coefficient sort sort sort 
Correlations with 
criterion (point 
biserial) 38 43 
Reliability (Pearson 
product moment) 88 


Note.—N = 28 new client group profiles. 


6 i BENJAMIN KLEINMUNTZ 


Maladjusted and Adjusted Counseling 
Clients: Group 2 (Orientation Profiles). 
There were 37 subjects in this group (19 mal- 
adjusted and 18 adjusted), and they differ 
from Group 1 above in that their tests were 
administered during freshman orientation 
week. A period ranging from approximately 
1 week to 3 months elapsed between the test 
session and their contact with the counseling 
center. 

The expert was asked to sort these profiles 
on an 8-step forced-normal distribution which, 
from least to most adjusted, consisted of 1, 3, 
6, 9, 9, 6, 3, and 1 piles. Again, the cutoff 
point was in the middle of the distribution. 
There was a 3-day interval between sorts. 
The mean profile scores are presented in 
Table 6, and his results appear in Table 7. 

The results in Table 7 which were obtained 
from the Q sorts with the 37 orientation 
MMPI profiles are exceptionally superior to 
previous findings with respect to the high hit 
percents, the low miss percents, and the high 
correlations with the criterion. The sort/ 
re-sort reliability is about as high as the one 
reported for the large sample and is higher 
than the coefficient achieved with the 28 


TABLE 6 
Mean T Score Varues ror MMPI SCALES OF THE 
ORIENTATION GROUP . 
Male Female 
Mal- Mal- 
MMPI Adjusted adjusted Adjusted adjusted 
scale (V=11) (N=8) (N=7) (N=11) 
? 50.0 50.0 50.0 50.0 
L 46.1 43.9 51.0 45.5 
F 50.1 59.1 52.6 62.9 
K 54.9 52.5 58.7 51.3 
Hs 52.5 53.5 54.4 53.5 
D 52.4 74.2 55.0 64.2 
Hy 56.0 61.7 60.7 63.0 
Pd 56.9 65.5 60.1 68.7 
Mf 50.8 71.2 417 45.9 
Pa 53.6 63.4 60.1 71.5 
Pt 52.5 70.6 57.7 68.5 
Sc 51.0 71.5 58.4 73.6 
Ma 54.4 58.2 56.4 66.0 
Si 45.9 58.2 48.3 55.1 
Es* 51.1 45.2, 46.9 434 
Mt 9.5 21.1 12.4 23.1 
Note.—W = 37, 
à Raw score. 


TABLE 7 


PERCENTAGE OF Hrrs AND MISSES AND VALIDITY AND 
Reviasiity CorrFiclents of MMPI Q Sorter 


Hit % Miss % 
Valid Valid False False 
Q sort positive negative positive negative 
First 84 83 17 16 
Second 84 83 17 16 
Correlation First Second Sort/re- 
coefficient sort sort sort 

Correlations with 

criterion (point 

biserial) -70 67 
Reliability (Pearson 

product moment) 94 


Note.—N = 37 counseling group profiles. 


MMPI new client profiles. Inspection of 
Table 6 and a comparison of the mean MMPI 
scores presented there with those presented in 
Table 4 should serve as an explanation of the 
relatively higher hit percents achieved on this 
sample of orientation profiles; namely, there 
were more and higher clinical scale elevations 
in the sample presented in Table 6. In other 
words, the Ọ sorter had an easier task to per- 
form in the orientation sample than in the new 
client group, and this is reflected in higher hit 
percents and in higher criterion and sort/re- 
sort correlation coefficients. 

Combined Counseling Group: Group 3. 
There were 65 MMPIs in this sample (Groups 
1 and 2, above) and the sorter’s task was to 
build a 10-step forced-normal distribution 
which consisted of 1, 2, 6, 10, 13, 14, 10, 6, 2, 
and 1 piles of profiles. The mean profile 
scores of this group are presented in Table 8; 
and the Q sorter’s performance is shown in 
Table 9. 

The hit and miss percents and the correla- 
tions with the criterion reported in Table D; 
which are based on Ọ sorts done with the com- 
bined counseling group (N =65), are of an 
order of magnitude as one might predict from 
an inspection of results obtained from the two 
contributing samples. For example, if one 
examines the highest positive percents ob- 
tained on the sample of 28 profiles (e.g., 78%) 
and on the sample of 37 profiles (e.g., 84%), 
one might expect and does find that the com- 


COMPUTER IDENTIFICATION OF MALADJUSTMENT a 7 


TABLE 8 


Mean T Score Vatues FoR MMPI SCALES OF THE 
Comsryep COUNSELING GROUP 


Male Female 
Mal- Mal- 
MMPI Adjusted adjusted Adjusted adjusted 

scale (N = 25) (N = 13) (N =12) (N= 15) 
? 50.0 50.0 50.0 50.0 
L 46.4 47.5 49.2 41.3 
F 52.6 59.7 52.5 61.5 
K 53.6 52.5 56.0 50.5 
Hs 49.8 52.8 51.7 56.0 
D 57.8 71.0 56.2 66.3 
Hy 56.0 60.4 58.7 63.9 
Pd 55.3 64.4 58.6 68.7 
Mf 56.3 69.7 43.7 44.3 
Pa 52.9 63.5 56.9 70.3 
Pt 55.4 68.6 55.4 66.9 
Se 54.8 69.1 55.5 70.3 
Ma 574 57.7 54.8 65.3 
Si 49.8 58.9 49.2 54.7 
Es* 50.8 46.8 47.2 42.9 
Mt* 13.2 20.5 13.7 23.9 
Note.—N = 65. 

a Raw score, 


bined samples’ valid positive percent is some- 
where between 78-84% (e.g. 79%). The 
same observation might be made in a com- 
parison of the percents of valid negatives, and 
the false positives and negatives achieved on 
the two subsamples, with the equivalent per- 
cents of the larger group; and this is true also 
of the range of correlations with the criterion. 
The mean MMPI T scores presented in Table 


TABLE 9 


Percentace oF Hrrs AND Misses AND CRITERION AND 
RELIABILITY COEFFICIENTS OF MMPI Q Sorter 


Hit % Miss %o 
Valid Valid False False 
Q sort positive negative positive negative 
First 75 70 30 25 
Second 79 73 27 21 
Correlation First Second - Sort/re- 
coefficient sort sort sort 
Correlations with 
criterion (point 
biserial) 56 60 
Reliability (Pearson 
product moment) 95 


Note.—N = 65 combined counseling group profiles. 


8, when considered together with those of 
Tables 4 and 6, reflect the fact that the MMPI 
interpreter was confronted with a task that 
was midway in difficulty to the tasks of sort- 
ing the orientation and the new client profiles. 

The No-Counseling Group (Fraternity and 
Sorority): Group 4. The MMPI scores for 
the fraternity-sorority sample are shown in 
Table 10, and the expert’s results from his 
sorting of the 31 profiles of this sample (17 
maladjusted and 14 adjusted) along an 8-step 
normal distribution are presented in Table 11. 
The sorts were done on 2 successive days, and 
the number of profiles in the piles from least to 
most adjusted were 1, 2, 5, 8, 7, 5, 2, and 1, 
respectively. 

Inspection of Table 11 suggests which sam- 
ple is contributing the greatest amount of 
error variance to the hit percents of the large 
sample (V=126). The no-counseling frater- 
nity-sorority group was by far the most diffi- 
cult sample to Q sort. The expert’s tape re- 
corded protocol and the mean scale scores (see 
Table 10) attest amply to this fact. The hit 
percents are the lowest and the miss percents 
the highest so far reported. The criterion and 
the sort/re-sort coefficients are likewise the 
lowest thus far obtained. 


TABLE 10 


Mean T Score Varues ror MMPI Scares OF THE 
FRATERNITY-SORORITY SAMPLE 


Male Female 


Mal- Mal- 
MMPI Adjusted adjusted Adjusted adjusted 


scale (V=8) (N=11) (N=6) (N= 6) 
? 50.0 50.0 50.0 50.0 
L 47.1 47.8 46.7 42.7 
F 48.6 57.2 $1.3 59.2 
K 59.5 56.6 60.3 46.8 
Hs 52.9 58.4 48.8 51.5 
D 46.7 58.5 46.3 58.5 
Hy 62.9 61.9 55.3 57.0 
Pd 53.2 61.5 52.8 58.5 
Mf 62.2 59.5 44.3 47.3 
Pa 604 54.5 54.5 59.7 
Pt 52.2 59.1 56.3 56.7 
Sc 54.6 61.2 56.3 56.7 
Ma 57.2 62.7 58.0 58.5 
Si 42.1 49.2 44.7 56.0 
Es* 55.9 51.6 47.0 39.7 
Mt 8.5 15.2 8.7 22,3 
Note.—W = 31. 

a Raw score. 


TABLE 11 


PERCENTAGE OF Hits AND MISSES AND CRITERION AND 
RELIABILITY COEFFICIENTS OF MMPI Q Sorter 


Hit % Miss % 
Valid Valid False False 


Ọ sort positive negative positive negative 
First 65 64 36 35 
Second 71 71 29 29 
Correlation First Second Sort/re- 
coefficient sort sort sort 
Correlations with 
criterion (point 
biserial) AL 37 
Reliability (Pearson 
product moment) 85 


Note.—N = 31 fraternity and sorority group profiles. 


Combined Counseling and No-Counseling 
Groups: Group 5. The Q sort of this sample 
included the profiles of all of the above groups 
and consisted of+96 MMPIs. These profiles 
were sorted along a 12-step distribution and 
the piles, from least to most adjusted, re- 
spectively, consisted of 2, 5, 6, 10, 12, 13, 13, 
12, 10, 6, 5, and 2 profiles. Table 12 contains 


TABLE 12 


Mean T Score Vaturs ror MMPI SCALES OF THE 
COMBINED COUNSELING AND No-CounseLinc Groups 


Male Female 


Mal- Mal- 
MMPI Adjusted adjusted Adjusted adjusted 
scale (N= 33) (N = 24) (N =18) (N= 21) 


? 50.0 50.0 50.0 50.0 
L 46.6 47.7 48.3 46.0 
F 51.6 58.5 52.1 60.9 
K 55.0 54.4 57.4 49.4 
Hs 50.5 55.3 50.7 54.7 
D 55.1 65.2 52.9 64.0 
Hy 57.7 61.1 57.6 61.9 
Pd 54.8 63.1 56.7 65.8 
Mf 57.8 65.0 43.9 45.2 
Pa 54.7 59.4 56.1 67.3 
Pt 54.6 64.2 55.7 64.0 
Sc 54.7 65.5 55.8 66.4 
Ma 57.4 60.0 55.9 63.4 
Si 48.0 54.5 47.7 55.1 
Es* 52.1 49.0 . 47.1 42.0 

Mi 12.1 18.1 12.1 23.5 
Note. —N = 96, 


* Raw score. 


BENJAMIN KLEINMUNTZ 


the mean values on the MMPI scales for this 
group and Table 13 reports the Q sort results. 

The results obtained from Q sorting the 
combined counseling (N = 65) and no-coun- 
seling (V=31) groups (see Tables 9 and 11, 
respectively) are again as one might predict. 
In the hit and miss percents and in the corre- 
lations with the criterion, the results reported 
in Table 13 lie somewhere within the range of 
highest and lowest percents and coefficients, 
respectively. The mean scale scores which ap- 
pear in Table 12 seem also to be almost aver- 
ages of the scores of scales shown in Tables 8 
and 10, respectively. The only surprising find- 
ing is the rather high sort/re-sort reliability 
of .96 obtained with the combined group. 

Random Normal Group: Group 6. The 
mean T score values for this group are pre- 
sented in Table 14. No separate Q sort was 
done on this sample, and it is presumed that 
these 30 students are adjusted on the basis of 
not having sought professional psychological 
assistance. This group was included in the 
combined sample of 126 MMPI profiles and 
served as part of the adjusted sample. Of 
course with such a criterion of adjustment 
there is the possibility of including some really 
maladjusted students in this group; however, 
if there is misclassification here (there were 
two MMPI profiles whose elevations seemed 
to reflect severe emotional disturbance), the 
error would serve to make the Q sort discrimi- 
nation task more difficult. 


TABLE 13 


PERCENTAGE OF Hits AND MISSES AND CRITERION AND 
RELIABILITY COEFFICIENTS oF MMPI Q SORTER 


Hit % Miss % 
Valid Valid False False 


Q sort positive negative positive negative 
First 69 67 33 31 
Second 71 69 31 29 
Correlation First Second Sort/re- 
coefficient sort sort sort 
Correlations with 
criterion (point 
biserial) AT 50 
Reliability (Pearson 
product moment) 96 


Note.—N = 96 bined li. 
Sones combined counseling and no-counseling group 


erent 


C 


<o 


iena 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 9 


Computer Programing of Tape Recorded 
Protocols 


Approximately 60 hours of tape recorded 
protocol were obtained from the expert Q 
sorter, and the recorded material was carefully 
edited and compiled in order to construct a 
set of sequential decision rules. The emphasis, 
during this editing period, was not placed on 
simulating human thought processes, but was 
placed rather on getting the task of devising 
MMPI rules done.? An important distinction 
must be made at this point between three types 
of problem-solving computer programs so that 
the particular method used in this research 
may be seen in its proper perspective. The 
distinction, one which Newell and Simon 
(1961) have cogently stated in a recent publi- 
cation, is between computer programs that are 
written as direct attempts to simulate human 
processes and those written with the goal of 
performing the task without any special con- 
cern for human simulation. It is within the 
latter category that we were operating here, 

2 A study is currently under way in which simula- 
tion of the human’s thinking processes is of primary 
importance. The author thought at the outset of the 
present project that simulation could be incorporated 
within this study, but has learned since that time 
that this is a larger undertaking than was at first 
anticipated. 


TABLE 14 


Mean T Score Varurs ror MMPI Scares OF THE 
Ranpom-Normat GROUP 


MMPI Male Female 
scale (N = 15) (N = 15) 
? 50.0 50.0 
L 50.5 47.7 
F 51.8 $2.7 
K 58.3 $7.7 
Hs 51.5 55.3 
D 46.3 49.9 
Hy 53.9 58.3 
Pd 55.5 58.1 
Mf 52.5 46.3 
Pa 51.7 53.2 
Pt 49.9 55.9 
Sc 51.9 56.7 
Ma 56.3 60.9 
Si 47.7 49.1 
Es 51.9 46.6 
me 77 113 
Note.—N = 30. 


a Raw score. 


and these programs have a strong heuristic ele- 
ment in that they solve the problems in ques- 
tion by imitating human tricks. A third type 
of program, and one which was not used in 
this study, is the algorithmic approach which 
relies substantially on the arithmetic speed 
and “brute force” of the computer in perform- 
ing systematic routine calculations. 

During the tape recording procedure the 
MMPI interpreter was permitted to work at 
his own pace, and a minimum of coaching was 
necessary. It was anticipated at the outset of 
this phase of the study that considerable ver- 
bal interchange between the researcher and the 
interpreter might have to be tape recorded, 
but the ease with which explicit verbalizations 
were gotten obviated the need for such verbal 
exchange. The only coaching that was 
necessary was an occasional reminder to the 
Q sorter that he call out the code number of 
the profile he was considering and that he 
clarify a particular classification decision (e.g., 
sometimes the interpreter would neglect to 
mention the reason(s) he called a profile ad- 
justed or maladjusted). The information 
about the code number of a profile was im- 
portant because frequently when the Q sorter 
was midway in the sorting task he would 
change his mind about the adjustment-malad- 
justment decision of a particular profile. After 
the interpreter had completed all the Q sorting 
(e.g., five samples) the tape recordings were 
played back, and notes were taken of the 
particular scales and scale interactions that 
were utilized by him in arriving at his deci- 
sions. Also tallies were kept of the number of 
times scales and their interactions were used. 
As might be expected, the greatest amount of 
profile information was yielded during the 
initial hours of recorded playback. After sev- 
eral hours of listening to the tape recordings 
the listener could almost recite all the scales 
and interactions that were used. 

The MMPI profile information thus ob- 
tained from the Q sorter was then compiled 
into a set of sequential decision rules, and a 
flow chart was made of these rules. The por- 
tions of the taped protocol that were used and 
their corresponding MMPI decision rules are 
reproduced in Table 15. 

The flow chart of the decision rules is pre- 


10 BENJAMIN KLEINMUNTZ 


TABLE 15 


MMPI Decision RULES AND TAPE RECORDED PROTOCOL 


Rule 


Protocol 


1. If four or more clinical scales = T score 70, call 
maladjusted. 


2. If scales Hs, D, Hy, Pd, Mf, Pa, Pt, Sc and Si 
are =60 and if Ma 80 and Mt 10r, then call 
adjusted. 


3. If the first two scales in the Hathaway Code 
include Pd, Pa, or Sc, and at least one of these is 
= 70, then call maladjusted (if Mf is among the first 
two scales, then examine the first three scales in the 
Hathaway Code). 

4. If Pa or Sc= 70 and Pa, Pt, or Sc= Hs, D, 
or Hy, call maladjusted. 


5. Call maladjusted if Pa = 70 unless Mt =6n and 
K>65. 


6. If Mt=6, call adjusted. 


7. Call maladjusted if (Pa + Sc — 2+ Pt) =20 
and Pa or Sc => 65. 

8. If D or Pt are the primary elevations and Es 
= 45r, call adjusted. 


9. If Pd=70 and (a) male Mt= 15r or (b) 

female Mt= 17r, call maladjusted. 

10. If Mt= 23r and Es £ 45r, call maladjusted. 

11. If five or more clinical scales = 65 and if either 
Pa or Sc = 65, call maladjusted. 

12. Call adjusted if at least five clinical scales are 
between 40 and 60 and Es = 45r, 

13. Call maladjusted if the profile is male and M. f 
= 70 and Sc= Pt and Sc = 60. 

14. If Si=60 and Pa=60 or Sc= 70, call mal- 
adjusted. 


15, Call maladjusted if Es £ 35r. 


16. Call adjusted if Mt 10r. 


1. Now Im going to divide these into two piles 

-+on the left [least adjusted] I’m throwing all 
mults with at least four scales primed. 

2. TIl throw all mults to the right [most adjusted] 
if there’s no clinical scale above a T score of 60. I'll 
let Ma go up as high as 80 . . . maybe a raw score 
of 10 on Mt would be playing it safe ...so I’m 
looking at three things now and sorting according to 
these conditions. 

3. If either Pd, Pa, or Sc is primed, I’m putting it 
on the left side [least adjusted] . . . it would also be 
nice to have all of these scales slightly more elevated 
than the others. 


4. If the elevations are lopsided to the right with 
the left side of the profile fairly low, I’m throwing 
the mults to the left [least adjusted]. 

5. Here’s a paranoid character. I wish his K score 
were not quite so high and he could use more Mt 

- when that Mt score is less than 10, I figure 
something must be stabilizing him. I like an inverted 
V with F high on the validity scales, 

6. Boy, I don’t know that Mt is too low to call 
her maladjusted. I’ll settle for calling them adjusted 
if Mt is at a raw score of 6 or lower. 

7. Here’s a nice valley between Scales 6 and 8 and 
both 6 and 8 are high. I'll call this one maladjusted. 

8. These 27 profiles are giving me a pain... if 2 
or 7 is too elevated like, say, higher than a T score 
of 80 and if the Es scale is approaching a raw score 
of 50... PI call it adjusted. 

9. A primed Pd and an Mt raw score of 15 or 
more is going over to the left pile [least adjusted]. 
I guess on a male profile an Mt of 15 or more will 
do .. . and an Mt of 17 or more on a female profile. 

10. With Mt high and Es low, PII call maladjusted 
at this stage of the game. 

11. Everything’s up on this girl’s MMPI. I’m espe- 
cially bothered by the high Pa . . . here’s a high Sc 
. +. everything else is up too . . . over to the left 
[least adjusted]. 

12. Here are a couple of nice, normal looking mults. 
All scales hugging a T score of 50, and Es is nice 
and high . . . over to the right [most adjusted]. 

13. An elevated Mf is pretty common for boys 
around colleges, but when it’s primed and when Sc 
is up and is higher than Pt, I'll throw it to the 
left [least adjusted]. 

14, That’s a fairly high Si and Pa is up. PI call it 
maladjusted . . . here’s one with a high Si and Sc 
is also up. PI call this maladjusted. 

15. Here’s a pretty good looking MMPI, but that 
low Es makes me think something might be wrong 
- + to the left [least adjusted]. 

16. These are all pretty bad looking mults, I'll call 
adjusted if the Mt is lower than 10. 


Note.—The subscript R refers to raw scores, 


— ee a A 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 11 


sented in Figure 1. Inspection of this flow 
chart and of Table 15 suggests that the MMPI 
interpreter considered approximately 16 dif- 
ferent combinations of scales and used rather 
gross cutoff scores in arriving at his decisions 
(e.g., primarily the T scores 40, 60, and 70 


were used). It was frequently impossible to 
ascertain the cutoff scores used by the inter- 
preter unless a check was made back to the 
reference MMPI profile. The highest level of 
arithmetic used by the expert and formalized 
in our decision rules was Pa+Sc—2:Pt. 


Yı 
[Are Tor more clinical scales = (0? | # 
No 
Yes Are scales Hs, D, Hy, Pd, Mf, Pa, Pt, Sc, and Si all 
< 60; and are Ma < 80 and Mt < 10? 
No 
Are scales Pd, Pa or Sc among the first 2 scales in the Yes 


Yes 


Pa or Sc > 65? 


Are Pa or Sc > 70 and Pa, Pt or Sc 2 Hs, D, or Hy? 
o 
ire Pa > 70 andMt <6andK > 65? Zes 
lo 


‘Are 5 or more clinical scales > 65 and is either 


Hathaway code; and is one of these scales > 70? (If Mf 
is among the first 2 scales, then look at first 3 scales) 


No 
Yes 


N 
N 


No 


Yes 
Tre (Pa + Sc - 2Pt) > 20 and Pa or Sc > 65? 


No 
Call 
Are D and Pt the primary elevations, and is Es 2 45? Maladjusted 
N 


o 
Yes 

Is Pad > 70 and Mt > 15 (male) or Mt > 17 (female)? 
No 


Yes 


No 
Yes 


No 


re 5 or more scales between 40 and 60 and is Es 2 45? 


P Yes 


Is ME > 70 andSc > Pt and Sc 2 0 (male profiles only)? 


lo 
Ho Yes 


Is Si > 60 and Pa > 60 or Sc > 70? 


No 


No 


Call Unclassified 


Fic, 1. Flow chart of MMPI decision rules. 


12 BENJAMIN KLEINMUNTZ 


The flow charted information was then pro- 
gramed into 20-GATE, which is one of the 
languages in use at Carnegie and is an alge- 
braic coding system which facilitates the writ- 
ing of computer programs for the Bendix G-20 
model digital computer. The input data of 
the program consisted of a profile identifica- 
tion number, sex identification, criterion classi- 
fication (adjusted or maladjusted), and 16 
MMPI raw scores (K corrected) which were 
converted to T scores by the program. The 
output of the program, part of which appears 
in Table 16 consisted of the aforementioned 
input variables and approximately 21 compu- 
tations of various indexes (e.g., Anxiety In- 


dex, Internalization Ratio, and Hathaway 
Code) and scale combinations (e.g., Es— Mt, 
Pa—K, Hs+Hy—2-D). The program also 
specified the number of MMPI rules which 
were applied to a specific profile and the deci- 
sions reached by these rules. Also printed out 
were the hit rates of each of the rules and a 
2x2 table of valid negative and positive and 
false negative and positive hit percents 
achieved by the computer programed decision 
tules. The latter format is reproduced in 
Table 17. 

The program was tested on the MMPIs of 
the following groups: (a) the counseling 
group (V=65), (b) the no-counseling group 


TABLE 16 
A REPRODUCTION OF A PRINTOUT OF THREE Cases ON WHICH THE COMPUTER YIELDED CLINICAL JUDGMENTS 


New case 
Identification No. 209 Male Criterion is adjusted 
Question L F K Hs D5 | Ra Mion Pa, Pt Sc Ma Si Es Mt 
41 47 51 77 57 53 64 61 62 60 63 58 40 72 27 
Rawscore 58 1 
Beta = 13 Band 4 Delta = 10 AI = 50 IR = 89 


Pa+Sc—2-Pt=5 
2-F—-L—K= -22 


Hs + Hy —2:D=15 


Mt — Es = — 45 


438657-912/0: Hathaway Code (Welsh extension) 


Rules Call this person maladjusted 
Rules 18, 26, 22 Call this person normal 
This person is called normal on the basis of precedence rule 0 
New case 

Identification No. 210 Male Criterion is maladjusted 
Question L F K Hs DEVIS Pa Mi Po Pt Sc Ma Si Es Mt 

44 43 53 46 39 60 60 74 59 64 52 44 68 42 54 56 

Raw score 44 19 
Beta = — 3 Band 3 Delta = 39 AI = 66 ions 
Pa+Sc—2+Pt=4 Hs + Hy—2:D=—21 Mt—Es=2 
2:-F-L—K=17 
4/9632-57/80:1¢ Hathaway Code (Welsh extension) 
Rules 6 Call this person maladjusted 
Rules Call this person normal 
This person is called maladjusted on the basis of precedence rule 0 
New case 

Identification No. 211 Male Criterion is maladjusted 
Question L F KRUDA Ry = Pa ME Eas Ee Sc Ma Si Es Mt 

41 43 55 64 52 65 62 55 76 47 62 69 45 46 59 49 

Rawscore 48 12 

Beta = 14 Band 4 Delta = — 12 AI = 73 IR = 1.10 


Pa +Sc—2+Pt=—8 
2°F-L—K=3 


Hs + Hy—2-D=-— 16 Mt — Es = — 10 


5/8273-41/609: Hathaway Code (Welsh Extension) 
Rules 11, 14 Call this person maladjusted 
Rules 22 Call this person normal 


This person is called maladjusted on the basis of precedence rule 2 


= 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 13 


TABLE 17 


A COMPUTER Printout or THE Hır Rates or EACH 
RULE AND OF THE EnTIRE MMPI Procram 


Number 

of profiles 

Rule Hit rate applied to 
1 83 18 
2 86 7 
3 1.00 7 
4 88 17 
5 -78 9 
6 73 15 
7 «17 13 
8 75 4 
9 89 18 
10 50 6 
11 1.00 7 
12 719 14 
13 71 7 
14 33 21 
15 1.00 8 
16 54 13 

Total MMPI 
Classification Valid False 
Positive 63 14 
Negative 86 37 


(N=31), (c) the combined aforementioned 
samples (V=96), and (d) the total sample 
(N=126) which included the random-normal 
group. The hit percents for these samples are 
presented in Table 18. 

Inspection of the results reported in Table 
18 and comparison of these hit percents with 
those of the expert Q sorter indicate that the 
programed rules fared exceptionally well with 
the sample of 65 clients (see Table 9), but 
achieved a somewhat lower valid positive hit 
rate than did the Q sorter on the total sample 
(N=126; see Table 3) and on the combined 


counseling—no-counseling sample (V=96; see 
Table 13). The valid positive hit rate of the 
programed rules on the fraternity-sorority 
sample was a meagre 35% and this was a con- 
siderable drop-in accuracy when compared 
with 65 and 71% of the expert (see Table 11). 
It should be noted, however, that the valid 
negative hit rate of the programed rules on 
the fraternity-sorority sample was 100%; and 
this was an improvement over the sorter’s 64 
and 71% valid negative hit rate. An overall 
comparison of the rules’ and the Q sorters’ hit 
rates on the appropriate samples (e.g., com- 
pare Tables 3, 9, and 13 with Table 18) indi- 
cates that in all instances the computer rules 
were superior in correctly classifying adjusted 
students into their categories, but were in- 
ferior in most instances in classifying malad- 
justed students into the valid positive cate- 
gory. In short, it seems that the expert Q 
sorter was biased in the direction of classify-’ 
ing more students as maladjusted than ad- 
justed. From a practical point of view, if the 
programed rules are to be applied among col- 
lege populations, they would be of greater use 
if the pathologic bias of the Q sorter could be 
built into them. This is a value judgment 
which assumes that it is more costly to mis- 
classify a maladjusted student into the ad- 
justed category than to call an adjusted stu- 
dent maladjusted. From a methodological 
point of view the discrepancies between the Q 
sorter’s and automated rules’ hit percents are 
important because they indicate the extent to 
which the programed rules deviated from a 
perfect simulation of the human judge. In 
large part, these discrepancies probably arose 
because of this researcher’s nonoptimal order- 
ing of the programed rules. 


TABLE 18 
Percentace or Hrrs ann Misses or MMPI Decision RULES 
Hit % Miss % 
Valid Valid False False 
Group positive negative positive negative 

88 12 37 
Total sample (W = 126) 63 
Combined counseling (N = 65) a a 
No counseling (WV = 31) 3 S n § 


Combined counseling and no counseling (NW = 96) 


14 BENJAMIN KLEINMUNTZ 


Revision of MMPI Decision Rules 


Up to this point MMPI decision rules were 
devised primarily by utilizing information that 
the expert furnished in his tape recordings. 
In an effort to improve upon the expert’s Q 
sort performance and to sharpen the existing 
decision rules, the method used consisted 
mainly of statistical searching and shuttling 
back and forth between intuitive hunches 
about combinations of various scales and their 
possible effects on the criterion hit percents. 
A number of developed and available MMPI 
indexes was tried and some of these helped 
to improve the rules. Among those which were 
adopted were Welsh’s (1952) Internalization 
Ratio and Anxiety Index, the four-prime rule 
of Meehl and Dahlstrom (1960), and the 
Hathaway (1947) Code. Most importantly, 
in the derivation of the new rules, the com- 
puter’s capabilities for storing and retrieving 
‘large quantities of information and its facility 
for high speed arithmetical operations were 
exploited to the utmost. For example, one of 
the techniques that was used was to let the 
computer apply all the rules to a particular 
MMPI profile and to withhold its maladjusted 
or adjusted decision until a vote could be 
taken of the number of rules that favored one 
of the two classifications. In this way if six 
rules called a profile adjusted and two rules 
classified it as maladjusted, then the former 
diagnosis would be made. Also, on the basis 
of an empirical determination of the relative 
strength or weakness of a particular decision 
rule, the computer program was “taught” to 
attend to specific patterns of rules rather than 
just to the number of votes that each profile 
received. The pattern analytic approach to 
the rules themselves was made possible by 
writing the computer program in such a way so 
that it printed the hit rates and the number of 
profiles to which each decision rule applied 
(see Table 17). 

Finally, the completed set of MMPI deci- 
sion rules included the original expert inter- 
preter’s information, a number of intraprofile 
slope characteristics that the expert failed to 
observe, and an optimal ordering of the vari- 
ous components that comprised the whole. The 
hit percents for the total sample of 126 pro- 
files are reported in Table 19. As can be seen 


from the results in this table the programed 
decision rules were a considerable improve- 
ment upon the original MMPI rules and upon 
the expert’s Q sort. 

The MMPI decision rules, which consist of 
35 sequential steps and instructions that gov- 
ern the application of specific rules, are pre- 
sented in the Appendix. Although it is not 
advisable for practical reasons of time to at- 
tempt to apply these rules without the aid of a 
computer, it can probably be done. To proc- 
ess each MMPI profile by hand could take 
up to 30 minutes. The total estimated time in 
which the computer can do a similar job lies 
within the range of 75 of a second to 2} 
seconds; and these computer times depend on 
the ingenuity used in writing the program and 
on the speed of the particular model of elec- 
tronic digital computer to be used. 


Cross-Validation Samples 


The proof of the efficacy of any psychologi- 
cal measuring instrument, mathematical 
model, or set of rules lies in its ability to hold 
up when applied to new samples. Four new 
groups of college students served as cross- 
validation samples. The MMPI profiles and 
their accompanying diagnostic judgments as to 
their adjustment-maladjustment status were 
drawn from Brigham Young University (V= 
100), the University of Nebraska (V=116), 
the University of Iowa (N=155), and the 
University of Missouri (V=198). MMPI 
test administration to these students had taken 
place at the time that they sought counseling.” 


8 There were 15 exceptions to this test administra- 
tion procedure in the Brigham Young sample. These 
were 15 female students who had the MMPIs ad- 
ministered to them at freshman orientation time and 
then contacted the counseling center at a later date. 


TABLE 19 


PERCENTAGE OF Hits AND Misses oF Revisep MMPI 
Decision RULES WITH TOTAL SAMPLE 


Classifi- Total 
cation Valid False Unclassified unclassified 
Positive 91 12 4 
2 
Negative 84 9 0 


Note —N = 126. 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 15 


In almost all instances the maladjustment- 
adjustment judgment was arrived at by ac- 
cepting the consulted counselor’s decision. It 
should be noted that in terms of the way they 
were selected these four cross-validation sam- 
ples resembled most the criterion subsample 
which was earlier called the new client group 
(N=28). 

The mean MMPI T score values for the 
cross-validation groups are presented in Tables 
20, 21, 22, and 23; and a report of the valid 
and false positive and negative findings is pre- 
sented in Table 24. The proportion of malad- 
justed students who were correctly classified 
as such (e.g., the valid positives) ranged from 
68 to 84%; and the percentage of adjusted 
students correctly classified (valid negatives) 
ranged from 53 to 94. The percentage of 
MMPI profiles left unclassified by the rules 
is almost negligible in three samples, but did 
reach 7% in the Iowa sample. The size of 
the cross-validation “shrinkage” is not large 
and attests rather well to the validity gen- 
eralization of MMPI decision rules to other 
college populations. Nevertheless there is 
shrinkage for which an account must be given 
in order to help increase the applicability of 
the rules to new situations. 


TABLE 21 


Mean T Score Vatues ror MMPI Scares oF THE 
NEBRASKA GROUP 


Male Female 


Mal- Mal- 
adjusted Adjusted adjusted 


MMPI Adjusted 


scale (N= 26) (N= 24) (N=54) (N = 12) 
? 50.0 50.0 50.0 50.0 

L 4888 47.37 48.94 46.67 
F S015 S812 4809 619 
K 6265 $5.25 60.65 53.92 
Hs 5142 $817 48.26 «($5.83 
D 49.12 700 44.54 6433 


Hy 58.42 64.04 54.65 61.92 
Pd 59.08 66.62 53.63 64.75 
Mf 58.58 64.54 48.74 45.25 
Pa 55.35 61.71 53.72 61.08 


Pt 56.5 69.79 49.91 64.67 
Sc 55.73 69.5 51.46 67.83 
Ma 56.62 60.63 55.81 62.67 
Si 44.08 52.42 44.07 55.92 
Es* 51.15 46.08 49.65 42.5 
Mt* 5.58 21.46 6.83 21.33 
Note.—N = 116. 

a Raw score, 


There are at least two possible explanations 
for the size of the cross-validation shrinkage: 
(a) there are certain obvious differences be- 
tween the criterion group and the new sam- 


TABLE 20 TABLE 22 
Mean T Score Vatues FoR MMPI SCALES OF THE Mean T Score Varues FoR MMPI Scares OF THE 
Bricnam Younc GROUP Iowa Grour 
Male Female Male Female 
Mal- Mal- i Mal- ; Mal- 
MMPI Adjusted adjusted Adjusted adjusted MMPI Adjusted adjusted Adjusted adjusted 


scale (N=25) (N= 25) (N =25) (N = 25) 


scale (N=48) (N= 30) (N = 50) (N = 27) 


50.0 50.0 50.0 50.0 ? 50.0 50.0 50.0 50.0 
4 51.44 49.56 54.36 47.92 L 47,02 45.87 48.94 45.41 
F 52.48 574 $1.92 59.46 F 54.79 63.6 52.58 60.48 
K 58.56 53.04 57.12 54.08 K 55.6 514 56.12 53.22 
Hs 51.16 56.56 51.6 53.25 Hs 51.9 58.5 49.22 53.19 
D 51.08 61.32 50.08 59.37 D 56.46 71.33 49.94 63.89 
Hy 56.76 62.32 57.12 59.83 Hy 57.27 64.87 55.28 61.7 
Pd 58.04 66.68 56.76 66.67 Pd 61.06 68.27 58.62 69.22 
Mf 58.68 65.04 50.04 47.71 Mj 59.81 68.13 50.28 45.04 
Pa 54.52 58.4 56.16 64.29 Pa 57.15 61.5 54.82 61.96 
Pt 57.96 66.76 56.0 65.92 Pt 59.69 75.47 56.82 67.81 
Se 57.56 66.88 55.04 66.33 Sc 60.79 73.57 55.68 osn 
Ma 55.56 57.0 57.52 58.42 Ma 574 61.23 57.1 pos 
Si 47,84 55.84 48.68 55.92 Si 52.08 623 50.34 f 
Es* 47.76 43.52 45,08 40.08 Es* 49.73 45.9 45.06 a 
Mt 11.84 20.04 13.08 20.79 Mt 14.33 27.1 13.7 22.74 

= Note. —N = 155. 

Note.—N = 100. Dremi 


16 BENJAMIN KLEINMUNTZ 


TABLE 23 


Mean T Score VaLues ror MMPI SCALES OF THE 
Missourr GROUP 


Male Female 


Mal- 
adjusted Adjusted 


Mal- 


MMPI Adjusted adjusted 


scale (N = 87) (N=41) (N = 54) (N = 16) 
? 50.0 50.0 50.0 50.0 
L 48.07 47.68 50.78 48.64 
F 53.01 59.12 52.9 55.79 
K 54.18 $1.07 58.86 57.14 
Hs 49.63 52.63 50.55 52.07 
D 54.3 62.78 50.71 56.29 
Hy 55.05 58.51 54.73 60.36 
Pd 56.79 62.54 56.47 63.93 
Mf 56.49 61.63 54.31 48.0 
Pa 51.38 57.44 53.27 56.43 
Pt 58.24 64.2 54.67 60.64 
Sc 55:84 65.0 55.39 59.29 
Ma 57.98 58.9 54.8 54.5 
Si 48.67 54.0 48.51 51.29 
Es* 48.56 46.02 47.45 44.5 
Mi 13,76 17.15 11.65 15.29 
Note.—N = 198. 

à Raw score. 


ples, and (b) the diagnostic judgments ren- 
dered by the counselors on the criterion and 
cross-validation adjusted and maladjusted stu- 
dents may be inaccurate. Each of these ex- 
planations will be discussed separately. 

Differences between the Criterion and Cross- 
Validation Samples. The 569 students of the 
cross-validation group were administered their 
MMPIs at the time they sought counseling, 
therefore, they resembled most closely the 
new client group (V=28). Perhaps the cross- 
validation shrinkage might have been smaller 
if the criterion and the new samples had con- 
tained a comparable proportion of new clients, 
orientation and fraternity-sorority MMPIs. 
Moreover, the criterion and the cross-valida- 
tion samples were different in the ratio of mal- 
adjusted to adjusted MMPIs. For example, 
the ratio of maladjusteds to adjusteds in the 
criterion group was 36 (V=45) to 64 (N= 
81) %; and in the cross-validation samples the 
ratios were 50 to 50% (Brigham Young), 31 
to 60% (Nebraska), 37 to 63% (Iowa), and 
57 to 72% (Missouri). It is interesting to 
note, in this regard, that the sample in which 
the highest valid positive hit percent was 
achieved (Iowa, 84%) was also the sample in 
which the ratio of maladjusted to adjusted 


MMPIs most closely resembled the criterion 
group’s base rates. 

Cross-validation shrinkage may also have 
resulted from the fact that the MMPI deci- 
sion rules are highly dependent on the use of 
appropriate T score cutoff points; and the 
scores used with the criterion group may not 
be applicable to new samples. It is quite pos- 
sible that minor adjustments in T scores, up or 
down, may be all that is necessary to obtain 
higher hit percents. 

Diagnostic Judgments. A great deal of care 
was not exercised in this phase of the study 
in obtaining accurate diagnostic judgments, 
and it was assumed that any misclassifications 
that were made might be reflected in cross- 
validation shrinkage. The really surprising 
finding, however, is the small shrinkage in view 
of the fact that there were no less than five 
different persons rendering clinical judgments 
for the five samples, Undoubtedly, since there 


TABLE 24 


Percent Hits anp Misses or Revisep MMPI 
Decision Rures wire Four Cross- 
VALIDATION SAMPLES 


Total 
Sample Valid False Unclassified unclassified 
Brigham 
Young 
University 
Positive 80 36 2 1 
Negative 64 18 0 
University 
of Nebraska 
Positive 72 6 3 1 
Negative 94 25 0 
University 
of Iowa 
Positive 84 38 4 
Negative 53 12 9 7 
University 
of Missouri 
Positive 68 28 2 
Negative 70 26 5 2 


Note—Total N = 569. Bringham Young University N = 
100: Adjusted N= 50, Maladjusted N= 50; University of 
Nebraska N= 116: Adjusted N= 80, Maladjusted N = 36; 
University of Iowa N=155: Adjusted N= 98, Maladjusted 
N=57; University of Missouri N= 198: Adjusted N = 141, 
Maladjusted N = 57. 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 17 


was no standardization of diagnostic proce- 
dures, these five persons may have been ap- 
plying rather different criteria of adjustment 
and maladjustment. But aside from the dif- 
ferent frames of reference used by the various 
counselors, and even if greater diagnostic 
standardization had been applied, there al- 
ways remains the question of who is more ac- 
curate, the clinician or the measuring instru- 
ment? In other words, in principle it is possi- 
ble to postulate that the test is “picking up” 
certain adjusted or maladjusted features of a 
client that the clinician has missed. Regardless 
of whether the clinician or the test is more 
accurate, this state of affairs contributes to 
cross-validation shrinkage. 


Discussion 
Diagnostic Value of the Decision Rules 


In their present form, the MMPI decision 
rules (see the Appendix) are not ready for use 
in routine screening of maladjusted college 
students, Our findings indicated that the hit 
and miss rates of the rules were affected by 
sample size, proportion of maladjusted to ad- 
justed students within a particular sample, and 
by shifts in the ratios of maladjusted to ad- 
justed students in new samples from the cri- 
terion group’s base rates (see Table 24). It 
was not possible within the structure of the 
present study to partial out the relative effects 
upon the hit rates of each of these factors; 
and until such time as these factors and their 
corresponding effects are isolated, the MMPI 
rules should be used for research purposes 
only. 

It was implicitly assumed throughout this 
study that there exists universal agreement 
about the meaning of emotional adjustment 
and maladjustment within college settings. In 
other words, it was assumed for purposes of 
this research that the counselors from the uni- 
versities of the various cross-validation sam- 
ples applied identical adjustmental criteria 
and that these criteria were similar to the 
ones used at the Carnegie Institute of Tech- 
nology counseling center. This assumption 
obviously is a tenuous one and the differential 
effects upon the performance of MMPI deci- 
sion rules of dissimilar adjustmental defini- 


tions are not known. Perhaps the most surpris- 
ing finding of this study, in view of the lack 
of care that was exercised in defining emo- 
tional adjustment, was that MMPI rules per- 
formed as well as they did in the cross-valida- 
tion groups. However, it is clear that the 
diagnostic value of the decision rules depends 
not only upon the base rate considerations 
mentioned above, but also upon the particular 
adjustmental context within which one choses 
to operate. 


Cooperation between Man and Machine 


The final version of the decision rules is a 
result of the cooperation between the Q sorter, 
this investigator, and the computer. Without 
the Q sorter’s tape recorded instructions to 
guide it, the machine would have cycled end- 
lessly from one piece of MMPI information 
to another; and without the brute arithmetic 
force of the machine, this investigator would 
have been hesitant to experiment with as many 
MMPI scale and slope characteristics as he 
did. The cooperation between man and ma- 
chine can be summarized as follows: The Q 
sorter furnished many hypotheses about the 
way he combines MMPI information before 
arriving at a maladjustment-adjustment deci- 
sion; and the computer was programed to 
process the same MMPI profiles (e.g., the cri- 
terion group or any subsample thereof) in 
much the same manner as the Q sorter. Since 
no effort was made to obtain a one-to-one rela- 
tionship between the Q sorter and the com- 
puter program, there were marked hit and 
miss rate discrepancies between man and ma- 
chine. When it seemed that as much informa- 
tion as was useful had been borrowed from the 
Q sorter’s verbalizations, the author put aside 
the tape recordings and programed the ma- 
chine so that it experimented with an addi- 
tional 20 or 30 configural MMPI slope and 
scale characteristics during its interpretive 
process. Many of the additional hypotheses 
were borrowed from the Meehl-Dahlstrom 
rules, but many more reflected this author’s 
experience with MMPIs of college students. 
It is quite possible that the Q sorter could 
have furnished us with, and perhaps even 
used, these experimental profile characteris- 
tics; but it is here that the special powers of 


18 BENJAMIN KLEINMUNTZ 


the machine for this type of assignment came 
to the fore. Whereas the human may apply 
certain hypotheses and rules of thumb oc- 
casionally, the computer, once it is programed 
‘to contain these hypotheses or rules, will 
apply such rules always and consistently. 


Man versus Machine 


Has the machine’s superiority over man 
been demonstrated by the finding that the 
computer program was an improvement over 
the Q sorter’s hit rate? Definitely not. The 
final set of MMPI rules was an improvement 
over the Q sorter’s performance on the cri- 
terion sample, but this is not surprising in 
view of the fact that much effort and time 
were expended precisely for that purpose. In 
other words, much of the statistical shuttling 
back and forth between MMPI rules and the 
criterion sample consisted of formulating ad 
hoc rules and adjusting cutting scores so that 
the hit rate of the final set of rules is maxi- 
mized. As a matter of fact, considering the 
amount of time and effort spent in sharpening 
the computer programed rules so that they 
perform better than the Q sorter, their per- 
formance with the cross-validation samples 
was disappointing. Although no comparisons 
were made in this study between the machine’s 
and the man’s MMPI performance, there is 
evidence to suggest that our expert Q sorter 
could probably equal or surpass the machine’s 
performance. For example, the original dis- 
crimination task on the criterion sample was 
to the Q sorter what the cross-validation task 
was to the computer programed rules, and the 
Q sorter’s performance was superior to the 
machine’s in almost all instances, It is reasona- 
ble to expect that when confronted with the 
cross-validation samples, our Q sorter would 
probably do as well as he did on the criterion 
groups MMPI profiles. But however that may 
be, the machine is certainly superior to the 
human in one important respect: when con- 
fronted with the task of screening several 
hundreds of thousands of MMPI profiles, the 
machine will gallop along its merry way with- 
out even a grumble or a thought about where 
it might more profitably be spending its time. 


SUMMARY AND SUGGESTIONS FOR FURTHER 
RESEARCH 


The specific aims for which this study was 
designed are: (a) to devise a set of objective 
decision rules to aid in the identification of f 
maladjustment among college students, (b) to 
demonstrate the feasibility of mass processing 
of personality tests by means of the digital 
computer, and (c) to apply the computer in 2 
the area of diagnostic decision making. Each of fi 
these aims has been achieved, and the proce- in| 
dure used was that of computer programing | 
of the “maladjusted” versus the “adjusted” — | 
decisions of an expert test interpreter. The — Í 
interpreter’s decision making processes were 
tape recorded while he was thinking aloud 
during the Q sorting of MMPI profiles of 126 
college students. Diagnostic judgments had 
been independently obtained on each of the 
126 students, and these judgments (e.g., mal- 
adjusted or adjusted) served as the criteria 
that the expert tried to predict during his Q 
sorting sessions. The programed decision 
rules, which were based on the MMPI inter- _ 
preter’s protocol, and which were improved — 
upon by a process of trial-and-error statistical 
checking, yielded a greater hit percentage tha: 
the decisions of the original interpreter. The — 
programed rules held up moderately well when — 
applied to the MMPI profiles of four new 
cross-validation samples. i 

A number of problems for further research 
suggest themselves, and these are as follows: 


1. A pathologic bias was built into the 
MMPI decision rules because the emphasis — 
throughout this study was placed on devising 
a set of rules that would “pick up” emotional 
maladjustment. A price was paid for this 
pathologic bias in that the false positive rate i 
was high in three of the four cross-validatio: 
samples (see Table 24). The problem which os 
now needs study is whether this pathologic 
bias will be amplified when the rules are ap- 
plied to large, entering freshman samples. f 

2. There is a difference between identifica- | 
tion and prediction of a criterion. In the À 
former instance some statement is made in 
the present about the present state of affairs; 
prediction refers to a forecast in the present 
about some future event or state of affairs. 
The rules devised in this study discriminate 


_ between adjusted and maladjusted students at 
the time of testing (e.g., in the present) and 
herefore can be used to call attention to 


assistance. A problem of considerable impor- 
tance to both the student and the college ad- 
ministration is whether or not MMPI rules 
forecast emotional maladjustment. Another 
_ way of stating this problem might be to indi- 
- cate that the concurrent validity of the rules 
_has been established but their predictive 
validity needs further investigation.* 

3. It is suggested that research with MMPI 
‘maladjustment rules be directed to the elabo- 
‘ration of their construct validity and to the 
nderstanding of the phenomenology and dy- 
namics of cases which are “test hits” and “test 
isses” from the concurrent validity stand- 
“point. 
_ 4. Past research has indicated that there 
‘are substantial score differences between cer- 
_ tain types of college groups on MMPI scales. 
C These differences suggest that normative stud- 
jes with the decision rules be conducted on 
specific samples and that cutting scores on the 
various MMPI scales be adjusted appropri- 
ately in order to increase the “hit rate” with 
ertain groups. 

5. The decision rules, as they now stand, 
offer a trichotomous classification into the mal- 
adjusted, adjusted, or unclassified categories. 


4 The distinction which has been made here should 
hot obscure the fact that the criterion and the cross- 
alidational samples were obtained by a mixture of 
oncurrent and predictive validating procedures. The 
reader can, if he so desires, extricate these two sources 
bf data in the criterion sample; however, the distinc- 
tions are not clear in the cross-validational groups. 


Barron, F. An Ego-Strength scale which predicts 
response to psychotherapy. J. consult. Psychol., 
1953, 17, 327-333. 
Dazntstrom, W. G., & WetsH, G. S. An MMPI hand- 
book. Minneapolis: Univer. Minnesota Press, 1960. 
GoucH, H. G., & PEMBERTON, W. H. Personality 
characteristics related to success in practice teach- 
ing. J. appl. Psychol., 1952, 36, 307-309. 
Harmaway, S. R. A coding system for MMPI pro- 
files. J. consult. Psychol., 1947, 11, 334-337. 
HATHAWAY, S. R., & McKintey, J.C. A multiphasic 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 19 


It is possible with this limited classificatory 

scheme to have a psychotic as well as a 

mildly neurotic MMPI profile thrown into 

the maladjusted category. A scheme should be 

worked out for a scaling system whereby one 
profile might be called more or less malad- 

justed than another. For example, since all 

35 MMPI decision rules (see the Appendix) 

are applied to each profile, one profile might 
receive a higher maladjustment score than 
another on the basis of the greater number of | 
maladjusted rules that were applicable to that 
particular profile. In turn, one should investi- 

gate whether clinical judgments, which were 
arrived at independently of the decision rules, 
attest to the more or less maladjusted status 

of a particular MMPI. As a matter of fact, 

information about the degree of maladjust- 
ment was contained in the criterion Q sort 
tasks. For the purposes of the present study, 
a choice was made to discard this information. 
However, it may be of considerable interest to 
note the correspondence between the com- 
puter program’s classificatory scheme and that 
of the experienced human MMPI Q sorter. 

6. In this study a heuristic rather than a 
simulating approach was used to study clinical 
decision making. That is to say, instead of 
paralleling closely on a computer the thought 
processes of the expert test interpreter, this 
study settled for merely borrowing from the 
expert some strategies and rules of thumb for 
interpreting MMPIs. A contribution of con- 
siderable methodological and theoretical im- 
portance for profile analysis in particular and 
for clinical decision making generally would 
be to trace the clinician’s problem solving be- 
havior in its every detail. 


REFERENCES 


personality schedule (Minnesota): I. Construction 
of the schedule. J. Psychol., 1940, 10, 249-254. 

Kıremmunrtz, B. An-extension of the construct va- 
lidity of the Ego Strength scale. J. consult. Psychol., 
1960, 24, 463-464. (a) 

Kiemmuntz, B. Identification of maladjusted col- 
lege students. J. counsel. Psychol., 1960, 7, 209-211. 
(b) 

Kremmountz, B. Annotated bibliography of MMPI 
research among college populations. J. counsel. 
Psychol., 1962, 9, 373-396, 


20 BENJAMIN KLEINMUNTZ 


Kiemmuntz, B., & ALEXANDER, L. B. Computer 
program for the Meehl-Dahlstrom MMPI profile 
rules. Educ. psychol. Measmt., 1962, 22, 193-199. 

McKutey, J. C., Harmaway, S. R., & MEEHL, P. E. 
The MMPI: VI. The K scale. J. consult. Psychol., 
1948, 12, 20-31. 

Mezen, P. E. Configural scoring. J. consult. Psychol., 
1950, 14, 165-171. 

Meent, P. E., & Damrstrom, W. G. Objective con- 
figural rules for discriminating psychotic from 
neurotic MMPI profiles. J. consult. Psychol., 1960, 
24, 375-387. 

NEWELL, A., Saw, J. C., & Simon, H. A. Empirical 
explorations of the logic theory machine. Proc. 
West. Joint Comput. Conf., 1957, 11, 218-230. 


NEWELL, A., SHaw, J. C., & Smon, H. A. The ele- 
ments of a theory of human problem solving. Psy. 
chol. Rev., 1958, 65, 151-166. ‘ 

NEWELL, A., & Smon, H. A. Computer simulatio; 
of human thinking. Science, 1961, 134, 2011-2017; 

ScHorærp, W. A study of medical students with the 
MMPI: I. Scale norms and profile patterns. 
J. Psychol., 1953, 36, 59-65. i 

StepHenson, W. The study of behavior: Q-technique 
and its methodology. Chicago: Univer. Chicago — 
Press, 1953. 

Wetsu, G. S. An Anxiety Index and an Internaliza. 
tion Ratio for the MMPI. J. consult. Psychol., 1952 
16, 65-72. 


(Received May 14, 1963) 


The MMPI should be scored on 16 scales, and 
these include: ?, L, F, K, Hs, D, Hy, Pd, Mf, Pa, 
Pt, Sc, Ma, Si, Es, and Mt (Kleinmuntz, 1962, 
396). The latter two scales usually do not 
pear on the conventional MMPI profile sheet 
d should be notated and are reported here as 
‘raw scores. K correction is assumed for scales 
Hs, Pd, Pt, Sc, and Ma. All scores except for 
Scales Zs and Mt are reported here as T scores. 

» Application of these rules without the aid of an 
e lectronic digital computer may be exceedingly 
cumbersome due to the pattern analytic approach 
to the decision rules themselves.4+ 

_ The following calculations will be needed: 


~ 1, Hathaway Code 
2. Band location (Pt-+Sc)—(D+Hs) =beta 
; Band 1: beta=—31 and less 
Band 2: beta=—31 thru — 11 
Band 3: beta=—10 thru 6 
Band 4: beta=7 thru 25 
Band 5: beta=26 and above 
_ 3. Delta=(Pd+Pa)—(Hs+Hy) 


COMPUTER IDENTIFICATION OF MALADJUSTMENT 21 


APPENDIX 


COLLEGE MALADJUSTMENT RULES FoR MMPI INTERPRETATION 


6. Pd=70 and (a) Mt=15 (males); (b) 
Mt=17 (females). 
7. Pd = 70 and (a) Band 4 or 5 and A=0 or 
(b) Band 1 or 2 and A £0. 
8. Mt=23 and Es £ 50. 
9. Mt=23 and Es £ 45. 
10. Five or more scales =65 and Pa or 
Sc = 65. 
11. Male profile with Mf=70 and Sc= 60 
with Sc = Pt. 
12. Sc=70 and either Si or Pa = 60. 
13. Es = 35. 
14. IR =.90 A=—10. 
15. Sc is primary elevation (first in Hathaway 
Code) and is = 65 and F=L and (not plus) K. 
16. Band 2 profile. 
17. Band 3 and IR = 1.00. 
18. K=50 and any scale except Es or Ma 
= 70. 
19. Male profile and Mf = 65 and Pd = 63. 
20. Sc=60 and Si= 50 and AI= 60, unless 
Ma scale = 65. 
21. Sc = 60 and Si= 50 and Ma=70 and AI 
= 50. 


4, Anxiety Index (AT) = = 2 +H) 
+(D+Pt)—(Hs+Hy) 


ere : As+D+Pt 
5. Internalization Ratio (IR) = Hy+Pd+Ma 
Note: Proceed to the next rule regardless of 
the maladjustment versus adjustment decision. 
Since a tally must be kept of the number of rules 
hat apply to an MMPI profile, the rule number 
must be notated. 

Call maladjusted if: 

1. Four or more clinical scales = 70 (Mt and 
s excluded). 
` 2. The first two scales of the Hathaway Code 
are among the Scales Pd or Pa or Sc and one of 
these = 70. If Mf is one of the first two scales 
in the Hathaway Code, then examine the first 
hree scales. 

3. Pa or Sc=70 and Pa or Pt or Sc = Hs or 
D or Hy. 

4. Pa= 70, unless Mt £ 6 and K = 65. 

5. (Pa+Sc—2-Pt) =20, if Pa or Sc=65 
d if Pa and/or Sc = Pt. 


_ 41 The band locations, the beta and delta compu- 
tions, the Anxiety Index and the Internalization 
tio mentioned as basic calculations were adopted 
m the Meehl-Dahlstrom (1960) rules. It may be 
Ipful to the reader who is not familiar with MMPI 
literature to consult Dahlstrom and Welsh’s An 
MMPI Handbook (1960) for complete explanations 
some of the indexes used in these rules. 


22. Pd >= 63, and Hs=48 and AI = 65. 

23. Male profile and Pd=54, Hs>=58, and 
Si= 44. 

24. Hs=58, Hy £ 61. 

25. Hy £61 and Pd=63; also hold for fe- 
male profile if Pd is not the primary elevation. 

26. Pa and Sc > 60 if male, or > 65 is female. 

27. (Hs+Hy—2:D)=10, Pa<50, Pt= 50, 
and Mt= 10g. 

28. (Mt—Es) = 4p. 


Call adjusted if: 

29. Mt = bp. 

30. All scales =60 except Ma=80 and Mt 
= 10g. 

31. D or Pt are primary elevations and D = Hs 
and = Hy; and Pt= Pa and = Sc; and Es = 45. 

32. Mt <10g. 

33. Five scales between 40 and 60, and Es 
= 45. 

34. (Hs+Hy—2-D) =20; and Pt<Pa=70 
or Mt = 10g. 

35. (Mt+Es) =0p if female, =— 20, if male, 
unless Rule 5 calls profile maladjusted. 

Up until this point only tentative decisions 
have been made. The following flow chart speci- 
fies the conditions for the final clinical decisions. 
The decisions are one of three: (a) call adjusted, 
(b) call maladjusted, and (c) call unclassified. 


22 BENJAMIN KLEINMUNTZ 


’ 
4 y 
l 


Yes | Does Rule 34 and do 
Rules 5, 7 or 10 apply? 


No 


If no decision has been reached 
tally the number of rules that 

call the profile either adjusted 
or maladjusted 


Decision is made 
for category with 
greatest it 

of tallies 


No 


| 
} 
j 
l 


Disregard rules 

10, 14, 16 and 17 
Do another tally 
Call Unclassified 


Disregard 
Rule 31 


